CLM4.0 Error in Resubmit a job

carmelo_cammalleri@jrc_ec_europa_eu · Oct 30, 2013

HiI'm running CLM4.0 on Cineca cluster over Europe using custom meteo forcings and srf data.I succesfully get a 5 years run starting from an arbitrary (cold_start) condition.Now, I'm trying to continue the simulation using the restart files form the previous successfull run. In the first run I set RESUBMIT=3, hence now in env_run.xlm I have RESUBMIT = 2 and CONTINUE_RUN = TRUE However, when the second job start it stops during the initialization of lnd component"(seq_mct_drv) : Initialize lnd component" in cpl.log is the last echo. I got few clues on were the error occurs. In the ccsm.log file the error message reads:"Opened existing file /gpfs/scratch/userexternal/ccammall/inputdata/lnd/clm2/surfdata/surfdata_0171x0225pt_0.25x0.25_Europe_simyr2000.nc 65536Opened existing file /gpfs/scratch/userexternal/ccammall/inputdata/lnd/clm2/surfdata/surfdata_0171x0225pt_0.25x0.25_Europe_simyr2000.nc 65536Opened existing file /gpfs/scratch/userexternal/ccammall/inputdata/lnd/clm2/snicardata/snicar_optics_5bnd_c090915.nc 131072Opened existing file /gpfs/scratch/userexternal/ccammall/inputdata/lnd/clm2/snicardata/snicar_drdt_bst_fit_60_c070416.nc 131072Opened existing file test_025_Europe1.clm2.r.1995-01-03-00000.nc 131072Opened existing file test_025_Europe1.clm2.r.1995-01-03-00000.nc 131072Abort(1) on node 181 (rank 181 in comm 1140850688): Fatal error in PMPI_Wait: Error message texts are not available2013-10-29 12:06:44.688 (WARN ) [0x40000d08b30] :595823:ibm.runjob.client.Job: terminated by signal 62013-10-29 12:06:44.688 (WARN ) [0x40000d08b30] :595823:ibm.runjob.client.Job: abnormal termination by signal 6 from rank 181 "While, in the previous successful run I got"MCT::m_Router::initp_: GSMap indices not increasing...Will correctMCT::m_Router::initp_: RGSMap indices not increasing...Will correctMCT::m_Router::initp_: RGSMap indices not increasing...Will correctMCT::m_Router::initp_: GSMap indices not increasing...Will correct" In the lnd.log file the last messages are:"proc= 0 clump no = 1 clump id= 1 beg gridcell= 1 end gridcell= 343 total gridcells per clump= 343proc= 0 clump no = 1 clump id= 1 beg landunit= 1 end landunit= 366 total landunits per clump = 366proc= 0 clump no = 1 clump id= 1 beg column = 1 end column = 366 total columns per clump = 366proc= 0 clump no = 1 clump id= 1 beg pft = 1 end pft = 5854 total pfts per clump = 5854 " and I know that in the successfull run those are followed by:"(lnd_init_mct) :time averaging the following flux fields over the coupling interval" Any idea on the possible reasons of the crask? Thanks, Carmelo

CLM4.0 Error in Resubmit a job

carmelo_cammalleri@jrc_ec_europa_eu

New Member