Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Why cesm2.1.3 take so long?

CGL

CGL
Member
Hi,everynoe. I run a compsat F2000climo for 1 month. Till now, i took nearly 18hours. Why take so long?
Here is my detial for run the model:
1680679221197.png
Here is my time record:
1680679341733.png
 

nusbaume

Jesse Nusbaumer
CSEG and Liaisons
Staff member
Hi CGL,

There could be many different reasons why the model took so long to run. Do you know what resolution or model grid you are trying to use for this run? Also are you trying to run this on an NCAR machine (e.g. Cheyenne) or somewhere else? Finally, in your case directory there should be a timing directory, and in there should be a cesm_timing.XXX file, where XXX includes the case name and date. If you can provide that file then it might also be of help.

Thanks, and have a great day!

Jesse
 

CGL

CGL
Member
Hi CGL,

There could be many different reasons why the model took so long to run. Do you know what resolution or model grid you are trying to use for this run? Also are you trying to run this on an NCAR machine (e.g. Cheyenne) or somewhere else? Finally, in your case directory there should be a timing directory, and in there should be a cesm_timing.XXX file, where XXX includes the case name and date. If you can provide that file then it might also be of help.

Thanks, and have a great day!

Jesse
I'm looking the cesm_timing directory, but i foud nothing in there. When i use top command to seek my cesm.exe. I got this and it seems like running. But i checked the progress , maybe the progress hanging. So what's the issue?
1681208103145.png
1681208236717.png
 

CGL

CGL
Member
By the way,it call function:cime_pre_init1 in the line 58 of /data/sxh/CESM2/CESM/sugon/cesm2.1.3/cime/src/drivers/mct/main/cime_driver.F90.
1681214518480.png
 

nusbaume

Jesse Nusbaumer
CSEG and Liaisons
Staff member
Hi CGL,

It looks like the model is failing to broadcast the num_inst_driver variable. So I am suspicious that something is wrong with either the MPI library, the hardware you are using, or the driver namelists you are passing in. What happens if you rebuild the model with DEBUG set to TRUE in env_build.xml? Also, are you running with more than one instance (i.e. are any of the NINST variables set to something greater than one in env_mach_pes.xml)?

Finally, I am moving this thread to the infrastructure forum, as they know significantly more about MPI then I do (in case that is in fact the issue).

Thanks, and have a great day!

Jesse
 

CGL

CGL
Member
Hi CGL,

Are you running on a supported machine, or are you porting to a new machine. You can follow instructions on how to port to a new machine
at 6. Porting and validating CIME on a new platform — CIME master documentation. At the very least you should try running the hello world
MPI example to make sure your mpi communications are working properly.

Chris
Thanks. I tried the hello word MPI example, it worked. The progress maybe hanging and related to my latested post:post
 

fischer

CSEG and Liaisons
Staff member
I'm starting to suspect there might be an issue with your MPI libraries, or the machine. Can you try a different MPI library? Also,
what happens when you run with DEBUG set to TRUE? Something else you can try a simpler test like res=f19_g17, compset=X.
You can also try running your F2000clime configuration on fewer nodes. Do you have threading turned on? If you do, try turning that off.

Chris
 
Top