Hello Discuss CESM users:
I have managed to have my simulation compile, and I submit a simulation, but I encountered an out of memory error.
Simulation Information:
cesm3_0_beta01
ccs_config at tag ccs_config_cesm0.0.109
./create_newcase --case /capstor/scratch/cscs/jbuzan/cesm3_0_beta01/cases/intel_cesm3_0_beta01_BLT1850_v0c_T_08 --compiler intel --compset BLT1850_v0c --res ne30pg3_t232 --mach eiger --driver nuopc --mpilib mpich --run-unsupported
confluence.cscs.ch
arc.ucar.edu
The computational nodes on Derecho and Eiger are both 256 GB of memory with dual 64 cpus (128 cpus per node), so an out of memory issue should not occur. My guess is that there's a flag I am supposed to use?
I have attached the Macros.make file, config_mach, cesm.log, and lnd.log.
I have managed to have my simulation compile, and I submit a simulation, but I encountered an out of memory error.
Simulation Information:
cesm3_0_beta01
ccs_config at tag ccs_config_cesm0.0.109
./create_newcase --case /capstor/scratch/cscs/jbuzan/cesm3_0_beta01/cases/intel_cesm3_0_beta01_BLT1850_v0c_T_08 --compiler intel --compset BLT1850_v0c --res ne30pg3_t232 --mach eiger --driver nuopc --mpilib mpich --run-unsupported
Compute node configuration - Knowledge Base - Global Site
Documentation | ARC NCAR
The computational nodes on Derecho and Eiger are both 256 GB of memory with dual 64 cpus (128 cpus per node), so an out of memory issue should not occur. My guess is that there's a flag I am supposed to use?
I have attached the Macros.make file, config_mach, cesm.log, and lnd.log.
-rw-r-----+ 1 jbuzan s1207 478K Jul 11 11:22 cesm.log.3206209.240711-111429
-rw-------+ 1 jbuzan s1207 3.2G Jul 11 11:22 core_nid001022_52419
-rw-------+ 1 jbuzan s1207 3.6G Jul 11 11:22 core_nid001022_52498
.../capstor/scratch/cscs/jbuzan/cesm3_0_beta01/inputdata/lnd/clm2/ndepdata/fndep_c
lm_WACCM6_CMIP6piControl001_y21-50avg_1850monthly_0.95x1.25_c180802.nc
291
calcsize j,iq,jac, lsfrm,lstoo 2 5 2 19 20
slurmstepd: error: Detected 3 oom_kill events in StepId=3206209.0. Some of the step tasks have been OOM Killed.
srun: error: nid001022: tasks 1,111,124: Out Of Memory
srun: Terminating StepId=3206209.0
slurmstepd: error: *** STEP 3206209.0 ON nid001022 CANCELLED AT 2024-07-11T11:22:35 ***
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
libpthread-2.31.s 000014CFDBCF0910 Unknown Unknown Unknown
libmpi_intel.so.1 000014CFDE30F4F3 Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)