Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

The model did not stop properly after the initial run

rambhari01

Ram
New Member
What version of the code are you using?
CESM2.2.2

Have you made any changes to files in the source tree?

No

Describe your problem or question:
I am trying to run a slab ocean run with CLM5 (BGC-CROPS) using/modifying the compset ETEST as given below.

./create_newcase --case /scratch/rs9552/cesm2.2.2_T2/E_CAM6_2k_CLM5BGCCRPs_f09g16 --res f09_g16 --compset 2000_CAM60_CLM50%BGC-CROP_CICE_DOCN%SOM_MOSART_SGLC_SWAV_TEST --machine greene --run-unsupported

The initial run compiles and runs successfully for 5 days and saves the output successfully within 10-15 minutes. Model also writes all the rpointers (except rpointer.drv) but it stays in running mode for 1 hour (time asked) and then run stops with the following error message.

It seems that model hangs while wrapping up the initial run and I am not able to track the cause for this. Could you please check for the possible issue here.

I also tried F compset too which also give the same error for the --res f09_g16

max rss=635412480.0 MB

max rss=172478464.0 MB

memory_write: model date = 00010106 0 memory = -0.00 MB (highwater) 164.49 MB (usage) (pe= 256 comps= OCN)
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
slurmstepd: error: *** STEP 57540751.0 ON cs120 CANCELLED AT 2025-02-24T04:20:43 DUE TO TIME LIMIT ***
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source

Stack trace terminated abnormally.

forrtl: error (78): process killed (SIGTERM)

Image PC Routine Line Source


Stack trace terminated abnormally.
forrtl: error (78): process killed (SIGTERM)

Image PC Routine Line Source



log file is attached here.

Thanks
 

Attachments

  • cesmE.log.tar
    353.5 KB · Views: 2

nusbaume

Jesse Nusbaumer
CSEG and Liaisons
Staff member
Hi Ram,

I just tried that same configuration with the same code base on our derecho machine here at NCAR and it worked fine, which means that it might be a system or environment problem. Have you, or anyone else, been able to get a configuration of CESM2 to run on that machine? For example, what happens if you run an out-of-the-box F-case, like F2000climo? Alternatively, what happens if you use the CESM2.1.5 code base instead? Having a working example on your machine might help in determining what what the actual problem is that you are experiencing.

Thanks, and have a great day!

Jesse
 
Vote Upvote 0 Downvote

rambhari01

Ram
New Member
Hi Jesse,

Thank you for for your useful response. I also had similar doubt regarding the libraries (openmpi) it is using within current setup at HPC. Because model seems running well but it is not writing the final restrat pointer and files. I have explained the issue to HPC folks and they are looking into it.

Meanwhile, regarding your other questions.

1.) what happens if you use the CESM2.1.5 code base instead?

- I tried the same set but at the resolution f19_g16 (basically 2 degree), It gives an issue while initilizing the model. I posted the issue on forum and I am waiting for someone to reply, Here it is CLM: NaN Value while running slab ocean (ETEST) compset

Meanwhile, I got some hints to try it on newer branch. So i tried the same setup on cesm2.2.2.

I tried the same experiment setup, but at a f19_g16 resolution on cesm2.2.2 before changing to f09_g16. It asks for the 4 nodes at machine and this runs fine (Successful) without any issue. I am attaching the log file here (cesm222_f19_Ecmp.tar).

2.) what happens if you run an out-of-the-box F-case, like F2000climo?

- I also tried a F compset case at f09_g16 resolution :

./create_newcase --case /scratch/rs9552/cesm2.2.2_T2/F_CAM62k_CLM5BGCCrop_T1 --res f09_g16 --compset 2000_CAM60_CLM50%BGC-CROP_CICE%PRES_DOCN%DOM_MOSART_CISM2%NOEVOLVE_SWAV --machine greene --run-unsupported



It also gave the same issue, I am attaching log from this too.

Stack trace terminated abnormally.
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
Stack trace terminated abnormally.
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
Stack trace terminated abnormally.
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.


Please check.

HPC folks are also looking into this. Any hint would be a great help if you notice something which we are not able to catch.

Thank You.

-Ram
 

Attachments

  • cesm_222_f19_Ecomp.tar
    972.5 KB · Views: 0
  • cesm_F_clim2k_f19.tar
    198 KB · Views: 0
Vote Upvote 0 Downvote
Top