Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Memory error on hopper

Wouldn't you know it? I have one more simulation of quarter degree fvCAM5 to make using cesm1_0_3 on hopper.nersc.gov. Since the OS upgrade, I get the following error, more or less at random times during the first month of integration. ccsm.log.150406-024752:[NID 05064] 2015-04-06 04:26:54 Apid 49402235: OOM killer terminated this process.

 This means that the code has run out of memory. I have not had this error before the OS upgrade on 100s of similar jobs. In fact, I had one case, compiled prior to the OS that still runs after the upgrade with no such error. I think there are two possibilities.a) the new compiler has a memory leak.orb) The new compiler makes a bigger code. What are my options?RegardsMichael
 

jedwards

CSEG and Liaisons
Staff member
Hi Michael,See if you can figure out which task or tasks are giving the oom error - if you are running a B case giving more tasks to pop might help, ifyou are running an f case you might try changing the PIO_STRIDE value to spread out the IO a little.   And it might be a problem that we've already fixed in cesm 1.0.5 also the coupler log prints out the memory usage each time it prints dt (usually once every 24 hours ) - if there is a memory leak you should be able to see it in the cpl log.  
 
It does not appear to be a leak. The cpl log prints statements like  memory_write: model date =   170314       0 memory =     569.22 MB (highwater)       4854.35 MB (usage)  (pe=    0 comps= cpl ocn atm lnd ice glc)

 It is an f case, i.e../create_newcase -case  $CASEROOT -compset F_1850_CAM5 -res f02_f02 -mach hopp2  -din_loc_root_csmdata $SCRATCH/cam5_input_netcdf_files

So I upped ATM_PIO_NUMTASKS from 39 to 50. Do you have a recommended value?m
 
Top