Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Memory 'Leak'

I'm trying to debug CAM crashes on a 10 node dual quadcore 3.0GHz Xeon/16GB RAM system. The crashed are caused when cam runs out of memory after about 4 days. I've noticed that cam takes about 140MB of RAM more every hour on each node so with about 16GB, it would run out of memory.

The way CAM is running is with 1 MPI process per compute node and 8 OpenMP threads per node. It's being run with OpenMPI 1.2.5 (with Infiniband) and it was compiled with PGI 7.1-5 and cam3-1.1.p1. Each month takes about 25 days to finish according to the restart files (time difference between writing each).

Has anyone encountered this issue or have any tips on what I should try first before I open up a debugger?

Thanks,
Elvedin
 

rneale

Rich Neale
CAM Project Scientist
Staff member
Which code base are you using and does the run include any of your own code modification?
Turning on the debugger at compile time may help as it should trap problems with missing/incorrect
array deallocations.
 
rneale said:
Which code base are you using and does the run include any of your own code modification?
Turning on the debugger at compile time may help as it should trap problems with missing/incorrect
array deallocations.

Thanks for the quick reply.

By code base do you mean which version of cam (installed from cam3-1.1.p1_source_code.tar.gz) or what? The code has not been modified to my knowledge other than replacing GCC as a C compiler with PGI's C compiler, as has been done with different packages.
 

rneale

Rich Neale
CAM Project Scientist
Staff member
The PGI compiler can be problematic sometimes so the first thing I would do is drop the optimization to -O0 to make sure the optimization is not introducing the problems.
 
rneale said:
The PGI compiler can be problematic sometimes so the first thing I would do is drop the optimization to -O0 to make sure the optimization is not introducing the problems.

I've been trying to go down to an -O1 maximum for all, but I haven't had much luck. Where do I set it?
 

rneale

Rich Neale
CAM Project Scientist
Staff member
In the directory models/atm/cam/bld the options for the compiler are set in Macros files and you should be able to reset the optimization levels there.
 
rneale said:
In the directory models/atm/cam/bld the options for the compiler are set in Macros files and you should be able to reset the optimization levels there.

Sorry, I'm too familar with the terminology but what filename would it have? I have some build .csh files for various versions and then the configure and .pm Perl files. Which one would I need to edit?

I edited the Linux section of Makefile as such -

mod_path := -I$(ESMF_MOD)/$(ESMF_ARCH) -I$(MOD_NETCDF)
FFLAGS := $(cpp_path) $(mod_path) -r8 -i4 $(CPPDEF) -Mdalign -Mextend -DNO_R16 -byteswapio
# FFLAGS := $(cpp_path) $(mod_path) -r8 -i4 $(CPPDEF) -Mdalign -Mextend -DPGF90 -byteswapio
FREEFLAGS := -Mfree
LDFLAGS :=

ifeq ($(DEBUG),TRUE)
FFLAGS += -g -Ktrap=fp -Mrecursive -Mbounds
SPEC_FFLAGS := $(FFLAGS)
else
# BELOW ADDED BY ELVEDIN
FFLAGS += -O1
SPEC_FFLAGS := $(FFLAGS)
# Check for override of default Fortran compiler optimizations
ifeq ($(F_OPTIMIZATION_OVERRIDE),$(null))
FORTRAN_OPTIMIZATION := -O1
endif
FFLAGS += $(FORTRAN_OPTIMIZATION)
endif

ifeq ($(SMP),TRUE)
FFLAGS += -mp
LDFLAGS += -mp
endif

But my addition in there is wrong since it just adds an extra -O1 and in some cases, none at all for pgf. pgcc completely ignores it and does a
pgcc -c -O -I.../code/cam1/models/utils/esmf
 
Top