Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

CESM 1_2_0 SOM PE layouts on Yellowstone

Dear All, I would like to run a Slab Ocean Model for many model years in CESM 1_2_0 with a low resolution (f45_g37) on Yellowstone.Maybe because it’s not consuming much computational resource, CESM assigned queue ‘caldera’ by default after setup.However, it’s very slow (8 hours wall clock time per model year). I checked the model timing and found this:    TOT Run Time:   27073.821 seconds       74.175 seconds/mday         3.19 myears/wday    LND Run Time:     404.472 seconds        1.108 seconds/mday       213.61 myears/wday    ROF Run Time:      48.541 seconds        0.133 seconds/mday      1779.94 myears/wday    ICE Run Time:    1140.622 seconds        3.125 seconds/mday        75.75 myears/wday    ATM Run Time:   24680.386 seconds       67.617 seconds/mday         3.50 myears/wday    OCN Run Time:      24.727 seconds        0.068 seconds/mday      3494.16 myears/wday    GLC Run Time:       0.000 seconds        0.000 seconds/mday         0.00 myears/wday    WAV Run Time:       0.000 seconds        0.000 seconds/mday         0.00 myears/wday    CPL Run Time:     373.764 seconds        1.024 seconds/mday       231.16 myears/wday    CPL COMM Time:   2080.388 seconds        5.700 seconds/mday        41.53 myears/wday So the ATM component is taking much time, and I tried changing the model PE layout.The default PE setting below runs without a problem on Yellowstone’s caldera queue,but when I changed all NTASKS to 16, or increased NTASKS only for ATM to 16,the model would run 5 model days for more than 20 mins and got killed for running over time.Copying env_mach_pes.xml files from other cases didn't work either. Will you please help to make the model run faster?             Best,Zaiyu
 

jedwards

CSEG and Liaisons
Staff member
You can keep the original PE layout and change to the small queue to use dedicated resources instead of the shared queue by editing the $CASE.run script.   Another option is to increase the nthrds for the atm component from 1 to 4, this change will cause your job to run in the regular queue which will also give you dedicated resources.  
 
Thanks a lot for your help. I then tried changing the queue from caldera to small;the corresponding timing speeded up from 68.325 seconds/mday to 68.050 seconds/mday.Then I increased NTHRDS_ATM from 1 to 4, runtime became 55.858 seconds/mday,still with the atmosphere component consuming most of the time.Could there be some more improvements?
 
I later changed all the NTASKS to be the maximum 255 supported on Yellowstone and remained NTHRDS to be 1.Then the model can run for 49 model years per day!The only thing I need to complain about is the CONTINUE_RUN option.According to the user guide: "A brief note on restarting runs. When you first begin a branch, hybrid or startup run, CONTINUE_RUN must be set to FALSE. When you successfully run and get a restart file, you will need to change CONTINUE_RUN to TRUE for the remainder of your run."The model should be able to warn the user if CONTINUE_RUN is set to be TRUE on a initial run, rather than the fact that it would acquiesce and remain running forever without stopping.
 
Top