Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

what is the status of my run?

Hi there,

I am porting CESM 1.0.6 to a new machine. I created the ERI.f19_g16.B1850CN test. But it kept exceeding my wall-time limits. After each time it was killed, the last lines of my cpl.log file simply reads:

---------------------------------------------------------------
(seq_timemgr_clockPrint) Alarm = 7 seq_timemgr_alarm_tprof
(seq_timemgr_clockPrint) Prev Time = 00011227 00000
(seq_timemgr_clockPrint) Next Time = 00011227 18000
(seq_timemgr_clockPrint) Intervl yms = 0 0 18000


(seq_mct_drv) : Initialize each component: atm, lnd, ocn, and ice
----------------------------------------------------------------

Does it seem like the model has not even finished initializing?
The longest wall-time I tried was 20 hours on 32 nodes x 12 processors each. I am new to CESM and do not know whether it is reasonable for B1850CN compset to behave like this. Or is there perhaps some problem?

Any help will be greatly appreciated.
 

jedwards

CSEG and Liaisons
Staff member
It shouldn't take so long - it's proabably hanging in MPI someplace.    Try a simplier case, an X compset to begin with.   Or even the hello world test program to make sure that MPI is behaivng correctly.
 

jedwards

CSEG and Liaisons
Staff member
It shouldn't take so long - it's proabably hanging in MPI someplace.    Try a simplier case, an X compset to begin with.   Or even the hello world test program to make sure that MPI is behaivng correctly.
 

jedwards

CSEG and Liaisons
Staff member
It shouldn't take so long - it's proabably hanging in MPI someplace.    Try a simplier case, an X compset to begin with.   Or even the hello world test program to make sure that MPI is behaivng correctly.
 

santos

Member
Also, what does the cesm.log file say? If there are error messages, they usually go to cesm.log, not cpl.log.
 

santos

Member
Also, what does the cesm.log file say? If there are error messages, they usually go to cesm.log, not cpl.log.
 

santos

Member
Also, what does the cesm.log file say? If there are error messages, they usually go to cesm.log, not cpl.log.
 
Thank you for pointing out the problem. My X and A compsets can run fine. The B compset ran and gave the above errors. The F, C, D compsets were unable to run at all...
 
Thank you for pointing out the problem. My X and A compsets can run fine. The B compset ran and gave the above errors. The F, C, D compsets were unable to run at all...
 
Thank you for pointing out the problem. My X and A compsets can run fine. The B compset ran and gave the above errors. The F, C, D compsets were unable to run at all...
 
Hi, the only error message I get in my ccsm.log files is that the job was killed. I attached the file here. Could you take a look and see if there were anything I missed? such as if there were PE layout or memory problems. Thank you. 
 
Hi, the only error message I get in my ccsm.log files is that the job was killed. I attached the file here. Could you take a look and see if there were anything I missed? such as if there were PE layout or memory problems. Thank you. 
 
Hi, the only error message I get in my ccsm.log files is that the job was killed. I attached the file here. Could you take a look and see if there were anything I missed? such as if there were PE layout or memory problems. Thank you. 
 

santos

Member
Looks like something is going wrong during CICE initialization, but I don't know enough about CICE to guess what. CICE is also on for D and F compsets (at least in prescribed mode), but not C, so your C case must have a different problem.
 

santos

Member
Looks like something is going wrong during CICE initialization, but I don't know enough about CICE to guess what. CICE is also on for D and F compsets (at least in prescribed mode), but not C, so your C case must have a different problem.
 

santos

Member
Looks like something is going wrong during CICE initialization, but I don't know enough about CICE to guess what. CICE is also on for D and F compsets (at least in prescribed mode), but not C, so your C case must have a different problem.
 

jedwards

CSEG and Liaisons
Staff member
Some of the tasks are in :
Code:
ice_spacecurve_mp        1258  ice_spacecurve.F90<br /><br />others are in<br /><br />shr_mpi_mod_mp_sh         538  shr_mpi_mod.F90<br /><br />You'll need to figure out why you aren't making further progress.    Why are you porting 1.0.6 instead of a newer version?  You might try a different PE layout especially wrt ice layout.<br /><br />
 

jedwards

CSEG and Liaisons
Staff member
Some of the tasks are in :
Code:
ice_spacecurve_mp        1258  ice_spacecurve.F90<br /><br />others are in<br /><br />shr_mpi_mod_mp_sh         538  shr_mpi_mod.F90<br /><br />You'll need to figure out why you aren't making further progress.    Why are you porting 1.0.6 instead of a newer version?  You might try a different PE layout especially wrt ice layout.<br /><br />
 

jedwards

CSEG and Liaisons
Staff member
Some of the tasks are in :
Code:
ice_spacecurve_mp        1258  ice_spacecurve.F90<br /><br />others are in<br /><br />shr_mpi_mod_mp_sh         538  shr_mpi_mod.F90<br /><br />You'll need to figure out why you aren't making further progress.    Why are you porting 1.0.6 instead of a newer version?  You might try a different PE layout especially wrt ice layout.<br /><br />
 
Hi to all,First post here, and pretty new to CESM. We are porting v.1.2.2 to a new machine.I've been running into the exact same problem as mentioned above: I'm creating and running the test cases as detailed page 55 of the user's guide, in the order specified; the first five tests passed OK (provided numbers 2 and 5 are added the _rx1 extension), and I'm now encountering cumbersome computation times for number 6 (namely ERI.f19_g16.B1850CN).My latest attempt was with a wall-time limit of 4 hours, 80 processes 4 nodes, and the job was killed before ending.I don't reckon I have any particular error message in the logs but I can provide them if needed.Test 3, also in B1850CN mode, was the longest of all so far, running in approximately 2 hours, versus a few minutes for all other four tests. So my question was whether it was expected that B1850CN-type tests have a pretty long duration, and therefore I should try pushing the time limit a bit further, or if something was wrong elsewhere...Thanks for any suggestion/help.
 
Top