Thank you in advance.
What version of the code are you using?
bgoswami:CESM220$ ./describe_version
------------------------------------------------------------------------
git describe:
cesm2.2.0-0-g332937b
------------------------------------------------------------------------
------------------------------------------------------------------------
git status:
Not currently on any branch.
Untracked files:
(use "git add <file>..." to include in what will be committed)
xmlchange_before_run.md
nothing added to commit but untracked files present (use "git add" to track)
------------------------------------------------------------------------
------------------------------------------------------------------------
manage_externals status:
Processing externals description file : Externals.cfg
Processing externals description file : Externals_CAM.cfg
Processing externals description file : .gitmodules
Processing submodules description file : .gitmodules
Processing externals description file : ../Externals_cime.cfg
Processing externals description file : Externals_CISM.cfg
Processing externals description file : Externals_CLM.cfg
Processing externals description file : Externals_POP.cfg
Checking status of externals: cam, chem_proc, carma, cosp2, clubb, silhs, pumas, atmos_phys, atmos_cubed_sphere, cice, cdeps, fox, cime, cmeps, cism, source_cism, clm, fates, ptclm, fms, mom, mosart, pop, cvmix, marbl, rtm, ww3,
M ./cime
modified sandbox, on cime5.8.32
HEAD detached at cime5.8.32
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: config/cesm/machines/config_batch.xml
modified: config/cesm/machines/config_compilers.xml
modified: config/cesm/machines/config_machines.xml
modified: scripts/lib/CIME/XML/env_mach_specific.py
Have you made any changes to files in the source tree?
<directive> --mem-per-cpu=2g </directive>
bgoswami:CESM220$ sed -n '643,652p' cime/config/cesm/machines/config_batch.xml
<batch_system MACH="bbg" type="slurm">
<batch_submit>sbatch</batch_submit>
<submit_args>
<arg flag="--time" name="$JOB_WALLCLOCK_TIME"/>
<arg flag="-p" name="$JOB_QUEUE"/>
</submit_args>
<queues>
<queue default="true">defaultp</queue>
</queues>
</batch_system>
<compiler MACH="bbg" COMPILER="gnu">
<CFLAGS>
<append DEBUG="FALSE"> -O2 </append>
</CFLAGS>
<CONFIG_ARGS>
<base> --host=Linux </base>
</CONFIG_ARGS>
<CPPDEFS>
<append> -DLINUX </append>
</CPPDEFS>
<FFLAGS>
<append DEBUG="FALSE"> -fallow-invalid-boz -fallow-argument-mismatch -O2 </append>
</FFLAGS>
<NETCDF_PATH>/mnt/nfs/clustersw/Debian/bookworm/openmpi/4.1.8/usr/netcdf/4.8.1</NETCDF_PATH>
<!-- <PIO_FILESYSTEM_HINTS>lustre</PIO_FILESYSTEM_HINTS> -->
<SLIBS>
<base> -L${NETCDF_PATH}/lib -lnetcdf -lnetcdff -L/mnt/nfs/clustersw/Debian/bookworm/openblas/0.3.29/lib -lopenblas </base>
</SLIBS>
<CPPDEFS>
<append MODEL="gptl"> -DHAVE_SLASHPROC </append>
</CPPDEFS>
</compiler>
<machine MACH="bbg" >
<DESC>ISTA HPC, batch system is SLURM</DESC>
<OS>LINUX</OS>
<COMPILERS>gnu</COMPILERS>
<MPILIBS>openmpi</MPILIBS>
<PROJECT>CESM</PROJECT>
<CIME_OUTPUT_ROOT>/nfs/scistore16/mullegrp/bgoswami/CESM220_output</CIME_OUTPUT_ROOT>
<DIN_LOC_ROOT>/nfs/scistore16/mullegrp/bgoswami/model_input/inputdata</DIN_LOC_ROOT>
<DIN_LOC_ROOT_CLMFORC>$DIN_LOC_ROOT</DIN_LOC_ROOT_CLMFORC>
<DOUT_S_ROOT>${CIME_OUTPUT_ROOT}/archive/$CASE</DOUT_S_ROOT>
<BASELINE_ROOT>${CIME_OUTPUT_ROOT}/cesm_baselines</BASELINE_ROOT>
<CCSM_CPRNC>/nfs/scistore16/mullegrp/bgoswami/CESM220/cime/tools/cprnc</CCSM_CPRNC>
<GMAKE_J>12</GMAKE_J>
<BATCH_SYSTEM>slurm</BATCH_SYSTEM>
<SUPPORTED_BY>bgoswami -at- ist.ac.at</SUPPORTED_BY>
<MAX_TASKS_PER_NODE>12</MAX_TASKS_PER_NODE>
<MAX_MPITASKS_PER_NODE>12</MAX_MPITASKS_PER_NODE>
<PROJECT_REQUIRED>TRUE</PROJECT_REQUIRED>
<mpirun mpilib="default">
<executable>srun</executable>
<arguments>
<arg name="num_tasks"> -n {{ total_tasks }}</arg>
<arg name="thread_count"> -d $ENV{OMP_NUM_THREADS}</arg>
</arguments>
</mpirun>
<module_system type="module">
<init_path lang="perl">/mnt/nfs/clustersw/Debian/bookworm/lmod/lmod/init/perl</init_path>
<init_path lang="python">/mnt/nfs/clustersw/Debian/bookworm/lmod/lmod/init/env_modules_python.py</init_path>
<init_path lang="sh">/mnt/nfs/clustersw/Debian/bookworm/lmod/lmod/init/bash</init_path>
<init_path lang="bash">/mnt/nfs/clustersw/Debian/bookworm/lmod/lmod/init/bash</init_path>
<cmd_path lang="perl">/mnt/nfs/clustersw/Debian/bookworm/lmod/lmod/libexec/lmod perl</cmd_path>
<cmd_path lang="python">/mnt/nfs/clustersw/Debian/bookworm/lmod/lmod/libexec/lmod python</cmd_path>
<cmd_path lang="tcsh">/mnt/nfs/clustersw/Debian/bookworm/lmod/lmod/libexec/lmod module</cmd_path>
<cmd_path lang="bash">/mnt/nfs/clustersw/Debian/bookworm/lmod/lmod/libexec/lmod module</cmd_path>
<modules>
<!--command name="purge"></command-->
<command name="purge"/>
<!--command name="load">scicomp-formats/20220527</command-->
<!--command name="load">gcc/12.2</command-->
<command name="load">git-lfs/3.6.1</command>
<command name="load">openmpi/4.1.8</command>
<command name="load">netcdf/4.8.1</command>
<command name="load">pnetcdf/1.12.3</command>
<command name="load">python/3.10.6</command>
<command name="load">perl/5.38.0</command>
<command name="load">gptl/8.1.1</command>
<command name="load">openblas/0.3.29</command>
<command name="load">cmake/3.24.2</command>
<command name="load">papi/7.0.1</command>
<!--command name="load">pgi/2019.04</command-->
</modules>
</module_system>
<environment_variables>
<env name="OMP_STACKSIZE">64M</env>
<!--env name="GPTL_VERBOSE">0</env>
<env name="GPTL_MEMORY">0</env>
<env name="CESM_GPTL_NOMEMORY">TRUE</env-->
</environment_variables>
</machine>
return run_cmd_no_fail("bash -c '{}module list'".format(source_cmd), combine_output=True)
Describe every step you took leading up to the problem:
If this is a port to a new machine: Please attach any files you added or changed for the machine port (e.g., config_compilers.xml, config_machines.xml, and config_batch.xml) and tell us the compiler version you are using on this machine.
Please attach any log files showing error messages or other useful information.
Describe your problem or question:
Regards,
Bidyut
What version of the code are you using?
bgoswami:CESM220$ ./describe_version
------------------------------------------------------------------------
git describe:
cesm2.2.0-0-g332937b
------------------------------------------------------------------------
------------------------------------------------------------------------
git status:
Not currently on any branch.
Untracked files:
(use "git add <file>..." to include in what will be committed)
xmlchange_before_run.md
nothing added to commit but untracked files present (use "git add" to track)
------------------------------------------------------------------------
------------------------------------------------------------------------
manage_externals status:
Processing externals description file : Externals.cfg
Processing externals description file : Externals_CAM.cfg
Processing externals description file : .gitmodules
Processing submodules description file : .gitmodules
Processing externals description file : ../Externals_cime.cfg
Processing externals description file : Externals_CISM.cfg
Processing externals description file : Externals_CLM.cfg
Processing externals description file : Externals_POP.cfg
Checking status of externals: cam, chem_proc, carma, cosp2, clubb, silhs, pumas, atmos_phys, atmos_cubed_sphere, cice, cdeps, fox, cime, cmeps, cism, source_cism, clm, fates, ptclm, fms, mom, mosart, pop, cvmix, marbl, rtm, ww3,
M ./cime
modified sandbox, on cime5.8.32
HEAD detached at cime5.8.32
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: config/cesm/machines/config_batch.xml
modified: config/cesm/machines/config_compilers.xml
modified: config/cesm/machines/config_machines.xml
modified: scripts/lib/CIME/XML/env_mach_specific.py
Have you made any changes to files in the source tree?
- Changes in config_batch.xml file:
<directive> --mem-per-cpu=2g </directive>
bgoswami:CESM220$ sed -n '643,652p' cime/config/cesm/machines/config_batch.xml
<batch_system MACH="bbg" type="slurm">
<batch_submit>sbatch</batch_submit>
<submit_args>
<arg flag="--time" name="$JOB_WALLCLOCK_TIME"/>
<arg flag="-p" name="$JOB_QUEUE"/>
</submit_args>
<queues>
<queue default="true">defaultp</queue>
</queues>
</batch_system>
- Changes in config_compilers.xml file:
<compiler MACH="bbg" COMPILER="gnu">
<CFLAGS>
<append DEBUG="FALSE"> -O2 </append>
</CFLAGS>
<CONFIG_ARGS>
<base> --host=Linux </base>
</CONFIG_ARGS>
<CPPDEFS>
<append> -DLINUX </append>
</CPPDEFS>
<FFLAGS>
<append DEBUG="FALSE"> -fallow-invalid-boz -fallow-argument-mismatch -O2 </append>
</FFLAGS>
<NETCDF_PATH>/mnt/nfs/clustersw/Debian/bookworm/openmpi/4.1.8/usr/netcdf/4.8.1</NETCDF_PATH>
<!-- <PIO_FILESYSTEM_HINTS>lustre</PIO_FILESYSTEM_HINTS> -->
<SLIBS>
<base> -L${NETCDF_PATH}/lib -lnetcdf -lnetcdff -L/mnt/nfs/clustersw/Debian/bookworm/openblas/0.3.29/lib -lopenblas </base>
</SLIBS>
<CPPDEFS>
<append MODEL="gptl"> -DHAVE_SLASHPROC </append>
</CPPDEFS>
</compiler>
- Changes in config_machine.xml file:
<machine MACH="bbg" >
<DESC>ISTA HPC, batch system is SLURM</DESC>
<OS>LINUX</OS>
<COMPILERS>gnu</COMPILERS>
<MPILIBS>openmpi</MPILIBS>
<PROJECT>CESM</PROJECT>
<CIME_OUTPUT_ROOT>/nfs/scistore16/mullegrp/bgoswami/CESM220_output</CIME_OUTPUT_ROOT>
<DIN_LOC_ROOT>/nfs/scistore16/mullegrp/bgoswami/model_input/inputdata</DIN_LOC_ROOT>
<DIN_LOC_ROOT_CLMFORC>$DIN_LOC_ROOT</DIN_LOC_ROOT_CLMFORC>
<DOUT_S_ROOT>${CIME_OUTPUT_ROOT}/archive/$CASE</DOUT_S_ROOT>
<BASELINE_ROOT>${CIME_OUTPUT_ROOT}/cesm_baselines</BASELINE_ROOT>
<CCSM_CPRNC>/nfs/scistore16/mullegrp/bgoswami/CESM220/cime/tools/cprnc</CCSM_CPRNC>
<GMAKE_J>12</GMAKE_J>
<BATCH_SYSTEM>slurm</BATCH_SYSTEM>
<SUPPORTED_BY>bgoswami -at- ist.ac.at</SUPPORTED_BY>
<MAX_TASKS_PER_NODE>12</MAX_TASKS_PER_NODE>
<MAX_MPITASKS_PER_NODE>12</MAX_MPITASKS_PER_NODE>
<PROJECT_REQUIRED>TRUE</PROJECT_REQUIRED>
<mpirun mpilib="default">
<executable>srun</executable>
<arguments>
<arg name="num_tasks"> -n {{ total_tasks }}</arg>
<arg name="thread_count"> -d $ENV{OMP_NUM_THREADS}</arg>
</arguments>
</mpirun>
<module_system type="module">
<init_path lang="perl">/mnt/nfs/clustersw/Debian/bookworm/lmod/lmod/init/perl</init_path>
<init_path lang="python">/mnt/nfs/clustersw/Debian/bookworm/lmod/lmod/init/env_modules_python.py</init_path>
<init_path lang="sh">/mnt/nfs/clustersw/Debian/bookworm/lmod/lmod/init/bash</init_path>
<init_path lang="bash">/mnt/nfs/clustersw/Debian/bookworm/lmod/lmod/init/bash</init_path>
<cmd_path lang="perl">/mnt/nfs/clustersw/Debian/bookworm/lmod/lmod/libexec/lmod perl</cmd_path>
<cmd_path lang="python">/mnt/nfs/clustersw/Debian/bookworm/lmod/lmod/libexec/lmod python</cmd_path>
<cmd_path lang="tcsh">/mnt/nfs/clustersw/Debian/bookworm/lmod/lmod/libexec/lmod module</cmd_path>
<cmd_path lang="bash">/mnt/nfs/clustersw/Debian/bookworm/lmod/lmod/libexec/lmod module</cmd_path>
<modules>
<!--command name="purge"></command-->
<command name="purge"/>
<!--command name="load">scicomp-formats/20220527</command-->
<!--command name="load">gcc/12.2</command-->
<command name="load">git-lfs/3.6.1</command>
<command name="load">openmpi/4.1.8</command>
<command name="load">netcdf/4.8.1</command>
<command name="load">pnetcdf/1.12.3</command>
<command name="load">python/3.10.6</command>
<command name="load">perl/5.38.0</command>
<command name="load">gptl/8.1.1</command>
<command name="load">openblas/0.3.29</command>
<command name="load">cmake/3.24.2</command>
<command name="load">papi/7.0.1</command>
<!--command name="load">pgi/2019.04</command-->
</modules>
</module_system>
<environment_variables>
<env name="OMP_STACKSIZE">64M</env>
<!--env name="GPTL_VERBOSE">0</env>
<env name="GPTL_MEMORY">0</env>
<env name="CESM_GPTL_NOMEMORY">TRUE</env-->
</environment_variables>
</machine>
- Changes in cime/scripts/lib/CIME/XML/env_mach_specific.py file:
return run_cmd_no_fail("bash -c '{}module list'".format(source_cmd), combine_output=True)
Describe every step you took leading up to the problem:
- Downloaded CESM2.2.0
- Placed it in /nfs/scistore16/mullegrp/bgoswami/CESM220
- Edited the files mentioned above. Except, for config_compilers.xml, I was compiling GPTL with -DHAVE_PAPI. I could successfully do ./case.setup and ./case.build but while running the job, I got an ERROR that said, ERROR: (shr_mem_init): GPTLget_memusage mrss0 failed
- I contacted our HPC system admin and I was informed that GPTL with -DHAVE_PAPI did not work because I do not have root access. Then I tried to compile GPTL without PAPI. I could build the model successfully (apparently not !).
- But I am still getting the same error : "ERROR: (shr_mem_init): GPTLget_memusage mrss0 failed"
If this is a port to a new machine: Please attach any files you added or changed for the machine port (e.g., config_compilers.xml, config_machines.xml, and config_batch.xml) and tell us the compiler version you are using on this machine.
Please attach any log files showing error messages or other useful information.
- Attached:
- config_batch.xml, config_compilers.xml, config_machines.xml, and env_mach_specific.py (I modified this python script so that bash runs the commands in the xml files, and not the default /bin/sh)
- gptl.bldlog
- cesm.log and cpl.log
Describe your problem or question:
- Is it OK to build GPTL without PAPI ?
- If yes, kindly check my xml and log files and let me know what should I do to address the error I am getting.
Regards,
Bidyut