CLM4.0 Error in Resubmit a job

HiI'm running CLM4.0 on Cineca cluster over Europe using custom meteo forcings and srf data.I succesfully get a 5 years run starting from an arbitrary (cold_start) condition.Now, I'm trying to continue the simulation using the restart files form the previous successfull run. In the first run I set RESUBMIT=3, hence now in env_run.xlm I have RESUBMIT = 2 and CONTINUE_RUN = TRUE However, when the second job start it stops during the initialization of lnd component"(seq_mct_drv) : Initialize lnd component" in cpl.log is the last echo. I got few clues on were the error occurs. In the ccsm.log file the error message reads:"Opened existing file /gpfs/scratch/userexternal/ccammall/inputdata/lnd/clm2/surfdata/surfdata_0171x0225pt_0.25x0.25_Europe_simyr2000.nc 65536Opened existing file /gpfs/scratch/userexternal/ccammall/inputdata/lnd/clm2/surfdata/surfdata_0171x0225pt_0.25x0.25_Europe_simyr2000.nc 65536Opened existing file /gpfs/scratch/userexternal/ccammall/inputdata/lnd/clm2/snicardata/snicar_optics_5bnd_c090915.nc 131072Opened existing file /gpfs/scratch/userexternal/ccammall/inputdata/lnd/clm2/snicardata/snicar_drdt_bst_fit_60_c070416.nc 131072Opened existing file test_025_Europe1.clm2.r.1995-01-03-00000.nc 131072Opened existing file test_025_Europe1.clm2.r.1995-01-03-00000.nc 131072Abort(1) on node 181 (rank 181 in comm 1140850688): Fatal error in PMPI_Wait: Error message texts are not available2013-10-29 12:06:44.688 (WARN ) [0x40000d08b30] :595823:ibm.runjob.client.Job: terminated by signal 62013-10-29 12:06:44.688 (WARN ) [0x40000d08b30] :595823:ibm.runjob.client.Job: abnormal termination by signal 6 from rank 181 "While, in the previous successful run I got"MCT::m_Router::initp_: GSMap indices not increasing...Will correctMCT::m_Router::initp_: RGSMap indices not increasing...Will correctMCT::m_Router::initp_: RGSMap indices not increasing...Will correctMCT::m_Router::initp_: GSMap indices not increasing...Will correct" In the lnd.log file the last messages are:"proc=  0  clump no =  1  clump id=  1  beg gridcell=  1  end gridcell=  343  total gridcells per clump=  343proc=  0  clump no =  1  clump id=  1  beg landunit=  1  end landunit=  366  total landunits per clump =  366proc=  0  clump no =  1  clump id=  1  beg column  =  1  end column  =  366  total columns per clump  =  366proc=  0  clump no =  1  clump id=  1  beg pft     =  1  end pft     =  5854  total pfts per clump     =  5854 " and I know that in the successfull run those are followed by:"(lnd_init_mct) :time averaging the following flux fields over the coupling interval" Any idea on the possible reasons of the crask? Thanks, Carmelo
 
Back
Top