Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

CLM4.0 Error in Resubmit a job

HiI'm running CLM4.0 on Cineca cluster over Europe using custom meteo forcings and srf data.I succesfully get a 5 years run starting from an arbitrary (cold_start) condition.Now, I'm trying to continue the simulation using the restart files form the previous successfull run. In the first run I set RESUBMIT=3, hence now in env_run.xlm I have RESUBMIT = 2 and CONTINUE_RUN = TRUE However, when the second job start it stops during the initialization of lnd component"(seq_mct_drv) : Initialize lnd component" in cpl.log is the last echo. I got few clues on were the error occurs. In the ccsm.log file the error message reads:"Opened existing file /gpfs/scratch/userexternal/ccammall/inputdata/lnd/clm2/surfdata/surfdata_0171x0225pt_0.25x0.25_Europe_simyr2000.nc 65536Opened existing file /gpfs/scratch/userexternal/ccammall/inputdata/lnd/clm2/surfdata/surfdata_0171x0225pt_0.25x0.25_Europe_simyr2000.nc 65536Opened existing file /gpfs/scratch/userexternal/ccammall/inputdata/lnd/clm2/snicardata/snicar_optics_5bnd_c090915.nc 131072Opened existing file /gpfs/scratch/userexternal/ccammall/inputdata/lnd/clm2/snicardata/snicar_drdt_bst_fit_60_c070416.nc 131072Opened existing file test_025_Europe1.clm2.r.1995-01-03-00000.nc 131072Opened existing file test_025_Europe1.clm2.r.1995-01-03-00000.nc 131072Abort(1) on node 181 (rank 181 in comm 1140850688): Fatal error in PMPI_Wait: Error message texts are not available2013-10-29 12:06:44.688 (WARN ) [0x40000d08b30] :595823:ibm.runjob.client.Job: terminated by signal 62013-10-29 12:06:44.688 (WARN ) [0x40000d08b30] :595823:ibm.runjob.client.Job: abnormal termination by signal 6 from rank 181 "While, in the previous successful run I got"MCT::m_Router::initp_: GSMap indices not increasing...Will correctMCT::m_Router::initp_: RGSMap indices not increasing...Will correctMCT::m_Router::initp_: RGSMap indices not increasing...Will correctMCT::m_Router::initp_: GSMap indices not increasing...Will correct" In the lnd.log file the last messages are:"proc=  0  clump no =  1  clump id=  1  beg gridcell=  1  end gridcell=  343  total gridcells per clump=  343proc=  0  clump no =  1  clump id=  1  beg landunit=  1  end landunit=  366  total landunits per clump =  366proc=  0  clump no =  1  clump id=  1  beg column  =  1  end column  =  366  total columns per clump  =  366proc=  0  clump no =  1  clump id=  1  beg pft     =  1  end pft     =  5854  total pfts per clump     =  5854 " and I know that in the successfull run those are followed by:"(lnd_init_mct) :time averaging the following flux fields over the coupling interval" Any idea on the possible reasons of the crask? Thanks, Carmelo
 
Top