Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Simulation hanging on Eiger

Jbuzan

Jonathan R. Buzan
Member
Hi CESM Users:

I am in the middle of porting CESM using the following settings.
Simulation Information:
cesm3_0_beta01
ccs_config at tag ccs_config_cesm0.0.109
./create_newcase --case /capstor/scratch/cscs/jbuzan/cesm3_0_beta01/cases/intel_cesm3_0_beta01_BLT1850_v0c_T_08 --compiler intel (or gnu) --compset BLT1850_v0c --res ne30pg3_t232 --mach eiger --driver nuopc --mpilib mpich --run-unsupported

I am having issues where I build a case with processor configurations, and simulation will execute for some time, and then hang. I think that means there is something wrong with my compilers.
Ex.
I use the processor configuration for the default Derecho build (2304 cores, 18 nodes). Eiger is supposed to be a similar machine, and should be able to handle this.
Under intel, in the cesm.log, the simulations gets to:
read_face_lengths_list : Porous Topography parameters: Dmin, Dmax, Davg ( 0.00 0.00 0.00)m

For gnu it makes it past that step but hangs at:
Opened existing file /capstor/scratch/cscs/jbuzan/cesm3_0_beta01/inputdata/atm/cam/chem/trop_mozart/ub/clim_p_trop.nc 287
Opened existing file /capstor/scratch/cscs/jbuzan/cesm3_0_beta01/inputdata/lnd/clm2/ndepdata/fndep_clm_WACCM6_CMIP6piControl001_y21-50avg_1850monthly_0.95x1.25_c180802.nc 289

Then nothing happens.
But with 256 cores, for both Intel and Gnu, CESM exits correctly. I am getting a throughput of ~1.2 Model Years / wday.

I am attaching my macros.make file.

Perhaps I should be using more of the NCAR Derecho Compilers? (Eiger is AMD EPYC 7742 CPU 128 CPU/node, 256GB ram).

Cheers,
-Jonathan
 

Attachments

  • Macros.txt
    2.2 KB · Views: 0

Jbuzan

Jonathan R. Buzan
Member
::Update::

Turns out there was a problem with Eiger, not CESM. Once that was resolved, the simulations execute as expected. Using a node configuration similar to Derecho (18 nodes), cesm3_0_beta01 on Eiger gets ~8 model years per workday. This was within 2% of Derecho (I tested the same model version and simulation setup).

This is excellent news.

Cheers,
-Jonathan
 
Top