trnja001@umn_edu
New Member
I'm trying to debug CAM crashes on a 10 node dual quadcore 3.0GHz Xeon/16GB RAM system. The crashed are caused when cam runs out of memory after about 4 days. I've noticed that cam takes about 140MB of RAM more every hour on each node so with about 16GB, it would run out of memory.
The way CAM is running is with 1 MPI process per compute node and 8 OpenMP threads per node. It's being run with OpenMPI 1.2.5 (with Infiniband) and it was compiled with PGI 7.1-5 and cam3-1.1.p1. Each month takes about 25 days to finish according to the restart files (time difference between writing each).
Has anyone encountered this issue or have any tips on what I should try first before I open up a debugger?
Thanks,
Elvedin
The way CAM is running is with 1 MPI process per compute node and 8 OpenMP threads per node. It's being run with OpenMPI 1.2.5 (with Infiniband) and it was compiled with PGI 7.1-5 and cam3-1.1.p1. Each month takes about 25 days to finish according to the restart files (time difference between writing each).
Has anyone encountered this issue or have any tips on what I should try first before I open up a debugger?
Thanks,
Elvedin