Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Out of Memory

Jbuzan

Jonathan R. Buzan
Member
Hello Discuss CESM users:

I have managed to have my simulation compile, and I submit a simulation, but I encountered an out of memory error.

Simulation Information:
cesm3_0_beta01
ccs_config at tag ccs_config_cesm0.0.109
./create_newcase --case /capstor/scratch/cscs/jbuzan/cesm3_0_beta01/cases/intel_cesm3_0_beta01_BLT1850_v0c_T_08 --compiler intel --compset BLT1850_v0c --res ne30pg3_t232 --mach eiger --driver nuopc --mpilib mpich --run-unsupported


The computational nodes on Derecho and Eiger are both 256 GB of memory with dual 64 cpus (128 cpus per node), so an out of memory issue should not occur. My guess is that there's a flag I am supposed to use?

I have attached the Macros.make file, config_mach, cesm.log, and lnd.log.

-rw-r-----+ 1 jbuzan s1207 478K Jul 11 11:22 cesm.log.3206209.240711-111429
-rw-------+ 1 jbuzan s1207 3.2G Jul 11 11:22 core_nid001022_52419
-rw-------+ 1 jbuzan s1207 3.6G Jul 11 11:22 core_nid001022_52498


/capstor/scratch/cscs/jbuzan/cesm3_0_beta01/inputdata/lnd/clm2/ndepdata/fndep_c
lm_WACCM6_CMIP6piControl001_y21-50avg_1850monthly_0.95x1.25_c180802.nc
291
...
calcsize j,iq,jac, lsfrm,lstoo 2 5 2 19 20
slurmstepd: error: Detected 3 oom_kill events in StepId=3206209.0. Some of the step tasks have been OOM Killed.
srun: error: nid001022: tasks 1,111,124: Out Of Memory
srun: Terminating StepId=3206209.0
slurmstepd: error: *** STEP 3206209.0 ON nid001022 CANCELLED AT 2024-07-11T11:22:35 ***
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
libpthread-2.31.s 000014CFDBCF0910 Unknown Unknown Unknown
libmpi_intel.so.1 000014CFDE30F4F3 Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)
 

Attachments

  • Macros.txt
    2.2 KB · Views: 1
  • cesm.log.3206209.240711-111429.txt
    477.7 KB · Views: 1
  • lnd.log.3206209.240711-111429.txt
    232.9 KB · Views: 0
  • config_machines.xml.txt
    2 KB · Views: 3

Jbuzan

Jonathan R. Buzan
Member
I forgot to mention. The setup is currently with 128 tasks total. Only 1 node is requested.
 

Jbuzan

Jonathan R. Buzan
Member
Ah. I threw more cores at it. 256. And it executed for both Intel and GNU compilers. 5 day run.

There are no atmospheric files? Is that normal? (I so rarely run just 5 days). Do I need to set the user_nl_cam to write daily output? I want to make sure that the simulation isn't bogus.
 

fischer

CSEG and Liaisons
Staff member
The atmospheric files are monthly, so you'll need to run at least a month to get the files. Or, you can turn on daily output in user_nl_cam.
When I run tests I output the atmosphere every 3 time steps using the following.

mfilt=1,1,1,1,1,1
ndens=1,1,1,1,1,1
nhtfrq=3,3,3,3,3,3
write_nstep0=.true.

My test runs are usually only for 9 time steps.

Chris
 

Jbuzan

Jonathan R. Buzan
Member
Hi Chris,

Thanks for this and the quick reply! I'll get that working. Hopefully everything looks okay, and I can run the benchmarks tomorrow.

Cheers,
-Jonathan
 

Jbuzan

Jonathan R. Buzan
Member
My simulation that I set up to execute 1 month completed successfully. I didn't modify the user_nl_cam, and it produced 2 atmospheric files, h0a and h0i. I was expecting 2d fields. Is this normal? I am unfamiliar with CESM3, and I was expecting a file that had 2d and 3d fields that could be viewed in ncview as maps.
 
Top