Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Recommended configuration for running CESM2.2.2 B1850 on a medium-sized cluster

antonioroblesucm

antonio robles
New Member
Hi all, i'm trying to test the B1850 compset with a f19g17 resolution in a medium-sized cluster (I have the model already compiled). I can access to maximum 192 cpus with 2 threads/cpu and the maximum RAM i can choose is 256Gb. By the moment, when I tried to distribute the tasks in 40 cpus for a 2-day (for testing) simulation I obtained SEGFAULT error. Which distribution of tasks threads and rootpe do You recommend for avoiding this SEGFAULT error. It is better to sepparate atm and ocn in two distinct nodes? How much cpus sould i use? By the moment I just want to test, a lower resolution would be also OK for me

Thanks!!!
 

antonioroblesucm

antonio robles
New Member
Now im able to run the model but the optimization of the tasks its not very good. For the compset b1850 at f19_g17 im obtaining 4000peshr/yr, with 160 cores i can only have 1 yr per wallclock day. This is the timing of the model:

Overall Metrics:
Model Cost: 4333.32 pe-hrs/simulated_year
Model Throughput: 0.89 simulated_years/day

Init Time : 791.495 seconds
Run Time : 1335.613 seconds 267.123 seconds/day
Final Time : 0.025 seconds

Actual Ocn Init Wait Time : 17.298 seconds
Estimated Ocn Init Run Time : 6.823 seconds
Estimated Run Time Correction : 0.000 seconds
(This correction has been applied to the ocean and total run times)

Runs Time in total seconds, seconds/model-day, and model-years/wall-day
CPL Run Time represents time in CPL pes alone, not including time associated with data exchange with other components

TOT Run Time: 1335.613 seconds 267.123 seconds/mday 0.89 myears/wday
CPL Run Time: 423.761 seconds 84.752 seconds/mday 2.79 myears/wday
ATM Run Time: 493.973 seconds 98.795 seconds/mday 2.40 myears/wday
LND Run Time: 407.164 seconds 81.433 seconds/mday 2.91 myears/wday
ICE Run Time: 254.786 seconds 50.957 seconds/mday 4.65 myears/wday
OCN Run Time: 818.710 seconds 163.742 seconds/mday 1.45 myears/wday
ROF Run Time: 62.369 seconds 12.474 seconds/mday 18.98 myears/wday
GLC Run Time: 1.572 seconds 0.314 seconds/mday 752.90 myears/wday
WAV Run Time: 105.545 seconds 21.109 seconds/mday 11.21 myears/wday
IAC Run Time: 0.000 seconds 0.000 seconds/mday 0.00 myears/wday
ESP Run Time: 0.000 seconds 0.000 seconds/mday 0.00 myears/wday
CPL COMM Time: 1277.778 seconds 255.556 seconds/mday 0.93 myears/wday
NOTE: min:max driver timers (seconds/day):

I attach also the tasks distribution. I have 40 pes node and a maximum of 4 nodes. How do you recommend me to optimize my task distribution? (This distribution is the only one i can run without segfault). I would really like to obtain at least 2000 pes /yr. Thanks!
 
Vote Upvote 0 Downvote
Top