Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

valgrind

Is it possible to run cesm2 with valgrind turned on?  I tried doing this on Cori by doing these two things:1) added a "module load valgrind" to cime/config/cesm/machines/config_machines.xml in the cori-haswell section              cray-netcdf-hdf5parallel        cray-netcdf-hdf5parallel/4.4.1.1.6        cray-hdf5-parallel/1.10.1.1        cray-parallel-netcdf/1.8.1.3        valgrind       2) modified env_mach_specific.xml    srun          --label      -n {{ total_tasks }}      valgrind --leak-check=yes      -c {{ srun_binding }}       The log file in the case directory itself showed this:run command is srun  --label  -n 45  valgrind --leak-check=yes  -c 2 /global/cscratch1/sd/mbranson/ne5-has/bld/cesm.exe  >> cesm.log.$LID 2>&1   But the cesm log file had this:16: valgrind: 2: command not found19: valgrind: 2: command not found18: valgrind: 2: command not found 6: valgrind: 2: command not found 1: valgrind: 2: command not found 7: valgrind: 2: command not found 4: valgrind: 2: command not found17: valgrind: 2: command not found 5: valgrind: 2: command not found so I suspect that adding a module load for valgrind in the config_machines.xml file is only getting utilized in the build stage (i.e., the module is not being loaded when the model is actually executed).  Is there any workaround for this?Thanks,Mark Branson 
 

jedwards

CSEG and Liaisons
Staff member
After you modified config_machines.xml did you create a new case?  Otherwise this isn't used.   Since you don't intend this to be a permanent change you should modify env_mach_specific.xml in the case instead of config_machines.xml - then you can see the enviroment that cesm will use with the commandsource .env_mach_specific.sh  (for bash users)or source .env_mach_specific.csh  (for csh and tcsh users) then do module list - if valgrind is there it should also be on the compute nodes.
 
Thanks for the reply, Jim.  I followed your advice and added the module load for valgrind into env_mach_specific.xml, and now by doing a source .env_mach_specific.csh I can see that the valgrind module is indeed being loaded.  But I still get "valgrind: command not found" in the cesm log when I try to run the model.  I was able to successfully run a helloWorld sample fortran program through the batch scheduler using valgrind and it worked so I feel confident that it is indeed available on the compute nodes on Cori.Mark 
 

jedwards

CSEG and Liaisons
Staff member
Add some code to your hello world to show you the full path to valgrind and look at the runenv file printed on the case log directory.
 
Here's the pertinent parts of my run_environment file (to avoid posting all 500 lines of it).  You can see that it seems to load the valgrind module correctly.Currently Loaded Modulefiles:  1) modules/3.2.11.1                  11) cray-libsci/19.02.1  2) altd/2.0                          12) pmi/5.0.14  3) darshan/3.1.7                     13) atp/2.1.3  4) cray-hdf5-parallel/1.10.2.0       14) PrgEnv-intel/6.0.5  5) valgrind/3.15.0                   15) intel/19.0.0.117  6) craype-haswell                    16) cray-netcdf-hdf5parallel/4.6.1.3  7) craype-hugepages2M                17) cray-parallel-netcdf/1.8.1.4  8) craype-network-aries              18) git/2.21.0  9) craype/2.6.0                      19) cmake/3.14.4 10) cray-mpich/7.7.8LD_LIBRARY_PATH=/global/common/cori_cle6/software/intel/compilers_and_libraries_2019.0.117/linux/compiler/lib/intel64:/global/common/cori_cle6/software/intel/compilers_and_libraries_2019.0.117/linux/mkl/lib/intel64:/opt/cray/job/2.2.4-7.0.0.1_3.26__g36b56f4.ari/lib64:/usr/common/software/valgrind/3.15.0/intel/lib/valgrind:/usr/syscom/nsg/libVALGRIND_INCLUDE=-I/usr/common/software/valgrind/3.15.0/intel/include/valgrindVALGRIND_DIR=/usr/common/software/valgrind/3.15.0/intelVALGRIND_LINK_OPTS=-L/usr/common/software/valgrind/3.15.0/intel/lib/valgrind -lcoregrind-amd64-linux -lvex-amd64-linux -lgccLOADEDMODULES=modules/3.2.11.1:nsg/1.2.0:altd/2.0:darshan/3.1.7:cray-hdf5-parallel/1.10.2.0:valgrind/3.15.0:udreg/2.3.2-7.0.0.1_4.23__g8175d3d.ari:ugni/6.0.14.0-7.0.0.1_7.25__ge78e5b0.ari:dmapp/7.1.1-7.0.0.1_5.15__g25e5077.ari:gni-headers/5.0.12.0-7.0.0.1_7.30__g3b1768f.ari:xpmem/2.2.17-7.0.0.1_3.20__g7acee3a.ari:job/2.2.4-7.0.0.1_3.26__g36b56f4.ari:dvs/2.11_2.2.131-7.0.0.1_7.3__gd2a05f7e:alps/6.6.50-7.0.0.1_3.30__g962f7108.ari:rca/2.2.20-7.0.0.1_4.29__g8e3fb5b.ari:craype-haswell:craype-hugepages2M:craype-network-aries:craype/2.6.0:cray-mpich/7.7.8:cray-libsci/19.02.1:pmi/5.0.14:atp/2.1.3:PrgEnv-intel/6.0.5:intel/19.0.0.117:cray-netcdf-hdf5parallel/4.6.1.3:cray-parallel-netcdf/1.8.1.4:git/2.21.0:cmake/3.14.4VALGRIND_MPI_LINK=-L/usr/common/software/valgrind/3.15.0/intel/lib/valgrind -lmpiwrap-amd64-linux
 
I finally got it to work.  My original change to env_mach_specific.xml which gave "valgrind: command not found" was this:      srun          --label      -n {{ total_tasks }}      valgrind --leak-check=full --dsymutil=yes --track-origins=yes --log-file=vallog      -c {{ srun_binding }}     and when I changed it to this (made valgrind the last argument) then it worked.         srun          --label      -n {{ total_tasks }}      -c {{ srun_binding }}      valgrind --leak-check=full --dsymutil=yes --track-origins=yes --log-file=vallog     
 
Top