CESM2.0.1 run successfully but without output

ykp990521

ykp990521
Member
Dear all,
The CESM version I am currently using is CESM2.0.1, with the compset of B1850. When using 360 cpus for all modules (NTASKS=360, ROOTPE=0) to run the model it works as usual. But as long as I use more cpus (like NTASKS=540), the model can run but won't produce any output files. In the log files there are no error reported, seems that they just stopped (but the cpus are still working). Has anyone encountered this issue?
 

Johnny

Johnny Guo
New Member
On my personal host, I have to specify NTASKS less than the number of physical CPU cores on my computer.
 

erik

Erik Kluzek
CSEG and Liaisons
Staff member
@ykp990521 if you are over subscribing the number of processors per node on your machine that would really slow it down. As @Johnny says above on some machines you may have to use less than the number of processors per node (this is rare however).

But, getting a good PE layout for fully coupled cases with CESM is also difficult to do. Different components scale differently with processor counts, and many have restrictions on how many processors can be used (and/or different sweet spots to be optimized), and since some components run concurrently with each other this is even more complicated. So you might have fallen in a performance "hole" where the model is just running horribly.

We do have some resources on getting good PE layouts for fully coupled cases. As I said this is fairly complicated, but especially for fully coupled cases like B1850 it's important to get right.

Here's a couple resource on load balancing CESM2...


 
Back
Top