Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

max rss memory issues with

I am running --res f09_f09_mg17 --compset F2000climo and the model freezes everytime if finishes a month. I increased the memore per CPU, number of nodes, and keep geting the same problem. I wonder if someone has have similar issues or have a clue about what may be wrong with the model. Last run got the following efficiency stats.


Job ID: 1316
Cluster: cluster
User/Group: jmejia/domain users
State: CANCELLED (exit code 0)
Nodes: 6
Cores per node: 56
CPU Utilized: 18-05:11:22
CPU Efficiency: 73.02% of 24-22:44:00 core-walltime
Job Wall-clock time: 01:46:55
Memory Utilized: 28.46 GB
Memory Efficiency: 1.06% of 2.62 TB

I notice from this stats that the issues is likely unrelated to memory.

Your insight and tips are apprecited.

Thanks,

--John
 

peverley

Courtney Peverley
Moderator
Staff member
Hi John,

A couple of suggestions:

1. If you don't have it already, turn on the DEBUG flag with ./xmlchange DEBUG=true and then doing a clean rebuild. You might get some more (or any) information about what's going on.
2. If possible, try switching compilers to see if the model is freezing in the same spot (you can probably tell at least somewhat where the model is freezing in the atm and cesm logs)
3. Try a coarser FV grid and see if you can get it to run to completion.
4. Have you modified the model/code at all? If so, try running it out of the box.
5. Make sure you're not requesting more nodes or memory than you have access to on the machine you're using.

Good luck and let me know if you get more information.

Courtney
 
Top