Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

CESM2.2 on Derecho Running Very Slow

shawnh

New Member
Following the directions on the CAM-Chem WIKI (https://wiki.ucar.edu/display/camchem/Run+CESM+with+Chemistry+on+Derecho) I was able to successfully install CESM2.2, build a case (WACCM compset FWSD, f09_f09_mg17 resolution), and complete a one-day run on Derecho. However, it is running very, very slow. On Cheyenne using the same case configuration with CESM2.2, the speed is typically ~4 minutes of physical time per model day using 1152 CPU's. On Derecho it is taking ~1.5 hours of physical time per model day using 1024 CPU's.

My guess is that I probably just need to do some tuning to my env_mach_pes.xml file on Derecho, correct? If so, any thoughts on what numbers I should be using? Is it possible something else could be going on?

I included the timing files from an 11 day run on Cheyenne and a 1 day run on Derecho and the env_mach_pes.xml files for each case.

Thanks!

-Shawn
 

Attachments

  • env_mach_pes-cheyenne.xml.txt
    7.2 KB · Views: 6
  • env_mach_pes-derecho.xml.txt
    7.2 KB · Views: 8
  • model_timing_stats-cheyenne.txt
    40.3 KB · Views: 4
  • model_timing_stats-derecho.txt
    35.7 KB · Views: 6
  • model_timing-cheyenne.000.txt
    65.6 KB · Views: 1
  • model_timing-dereco.000.txt
    44.9 KB · Views: 5

shawnh

New Member
/glade/work/shawnh/derecho/cesm_cases/waccm/f.e22.beta02.FWSD.f09_f09_mg17.cesm2_2_beta02.forecast.001.

Thanks!

-Shawn
 

jedwards

CSEG and Liaisons
Staff member
ls -ltr /glade/campaign/acom/acom-climate/WACCM-FORECAST
ls: cannot open directory '/glade/campaign/acom/acom-climate/WACCM-FORECAST': Permission denied

Can you open permissions on this directory? Also please using xmlchange when modifying the case instead of editing
xml files directly - this makes it much easier to track and reproduce your case.
 

shawnh

New Member
Hello, did you by chance to try and reproduce this issue? I know Derecho has been down and is now up at reduced running capacity, but just wanted to check in on it.
 

jedwards

CSEG and Liaisons
Staff member
We have had reports of improved performance since the downtime and I don't see anything in
your path to indicate that you've tried a run since then?

Try setting phys_loadbalance=0 or 1 in user_nl_cam
 

shawnh

New Member
So I am running with phys_loadbalance=0 and while I think that is better than before, it is still taking about 35 minutes to run 1 model day, compared to about 4 minutes per model day on Cheyenne. Is it worth trying phys_loadbalance=1?
 

shawnh

New Member
Okay, just to update, I have the run performance up to ~6 minutes per model day. The mistake I was making before was that I had NTHRDS set to 2 in the env_mach_pes.xml file and they needed to be set back to 1; and phys_loadbalance definitely slowed it back down when set to the default. Next up will be to see how much of a difference I get between setting phys_loadbalance to 0 and 1.
 

wfc1102@163_com

New Member
Okay, just to update, I have the run performance up to ~6 minutes per model day. The mistake I was making before was that I had NTHRDS set to 2 in the env_mach_pes.xml file and they needed to be set back to 1; and phys_loadbalance definitely slowed it back down when set to the default. Next up will be to see how much of a difference I get between setting phys_loadbalance to 0 and 1.
Hey shawnh, 6 minutes per model day, or, 240 model day per day, is still too slow. I saw others make it ~23 model years per day. Have you increased the simulation speed since Nov 2023? and what is your improvement?
 
Top