Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

case.setup fails with "module command None purge" error on TAMU Grace cluster despite pre-loaded environment

Junliang

Junliang Li
New Member
Hello,

I am porting CESM to the Texas A&M University (TAMU) Grace HPRC cluster and have encountered a persistent error that I believe points to a fundamental toolchain incompatibility.

After successfully creating a case, ./case.setup fails with ERROR: module command None purge failed with message: /bin/sh: None: command not found.

The crucial detail is that this error occurs even when the correct, complete module environment is manually loaded in the shell immediately before running ./case.setup. This suggests the CIME script is not correctly inheriting the environment from the parent shell.

I am using CESM 2.2.0. The output of ./describe_version is:
cesm2.2.0-0-g332937b

The source code tree at /scratch/user/junliang123/cesm2.2.0 is unmodified. All porting customizations have been made in the $HOME/.cime directory, following the recommended practice.

1. Porting: Created three modular configuration files (config_machines.xml, config_compilers.xml, config_batch.xml) in the $HOME/.cime directory. These are based on a combination of official porting guides and a previously successful port on this machine.

2. Environment Setup: Created a function load_cesm_env in my .bashrc file to load a specific, known-good combination of modules.

3. Case Creation: From the /scratch/user/junliang123/cesm2.2.0/cime/scripts directory, I run:
./create_newcase --case $SCRATCH/cesm_cases/BHIST_v2.2_final_test --compset BHIST --res f19_g17 --machine grace
This step completes successfully and creates the case directory.

4. Case Configuration: I then navigate to the case directory and configure the project:
cd $SCRATCH/cesm_cases/BHIST_v2.2_final_test
./xmlchange PROJECT=***********

5. Pre-loading Environment: I manually load the correct modules into my shell:
Bash

load_cesm_env

The module list command confirms all necessary modules are loaded.

6. Running Setup (Failure Point): I immediately run the setup script:
Bash

./case.setup

This is where the process fails with the module command None purge error.

Porting Files and Logs

This is a port to a new machine. Below are the contents of all custom configuration files, the environment function, and the full terminal log showing the final failed attempt.

Compiler Version: intel-compilers/2022.1.0

1. $HOME/.cime/config_machines.xml

<?xml version="1.0"?>
<config_machines>
<machine MACH="grace">
<DESC>Intel Xeon, Slurm batch system on Grace@TAMU</DESC>
<NODENAME_REGEX>.*grace</NODENAME_REGEX>
<OS>LINUX</OS>
<COMPILERS>intel</COMPILERS>
<MPILIBS>impi</MPILIBS>
<CIME_OUTPUT_ROOT>$ENV{SCRATCH}</CIME_OUTPUT_ROOT>
<DIN_LOC_ROOT>/ihesp/obs_root/inputdata</DIN_LOC_ROOT>
<DOUT_S_ROOT>$ENV{SCRATCH}/archive/$CASE</DOUT_S_ROOT>
<GMAKE_J>8</GMAKE_J>
<BATCH_SYSTEM>slurm</BATCH_SYSTEM>
<SUPPORTED_BY>junliang123</SUPPORTED_BY>
<MAX_TASKS_PER_NODE>48</MAX_TASKS_PER_NODE>
<MAX_MPITASKS_PER_NODE>48</MAX_MPITASKS_PER_NODE>
<mpirun mpilib="impi">
<executable>mpirun</executable>
<arguments>
<arg name="num_tasks"> -np $TOTALPES</arg>
</arguments>
</mpirun>
<module_system type="module">
<init_path lang="sh">/sw/lmod/lmod/init/sh</init_path>
<cmd_path lang="sh">module</cmd_path>
<modules>
<command name="purge"></command>
<command name="load">intel-compilers/2022.1.0</command>
<command name="load">impi/2021.6.0</command>
<command name="load">imkl/2022.1.0</command>
<command name="load">Python/3.10.4</command>
<command name="load">CMake/3.24.3</command>
<command name="load">netCDF-Fortran/4.6.0</command>
<command name="load">PnetCDF/1.12.3</command>
<command name="load">HDF5/1.12.2</command>
</modules>
</module_system>
<environment_variables>
<env name="OMP_STACKSIZE">256M</env>
</environment_variables>
</machine>
</config_machines>

2. $HOME/.cime/config_compilers.xml
<?xml version="1.0"?>
<config_compilers>
<compiler COMPILER="intel">
<ADD_FFLAGS_LEND>-qopenmp -assume realloc_lhs</ADD_FFLAGS_LEND>
</compiler>
<mpilib MPILIB="impi">
<ADD_LDFLAGS>-L$MKLROOT/lib/intel64</ADD_LDFLAGS>
</mpilib>
</config_compilers>

3. $HOME/.cime/config_batch.xml
<?xml version="1.0"?>
<config_batch>
<batch_system MACH="grace" type="slurm">
<queues>
<queue default="true">short</queue>
</queues>
</batch_system>
</config_batch>

4. load_cesm_env function from .bashrc
load_cesm_env() {
echo "Loading PROVEN CESM environment..."
module purge
module load intel-compilers/2022.1.0
module load impi/2021.6.0
module load imkl/2022.1.0
module load Python/3.10.4
module load CMake/3.24.3
module load netCDF-Fortran/4.6.0
module load PnetCDF/1.12.3
module load HDF5/1.12.2
echo " Proven environment loaded."
module list
}

5. Full Terminal Log of Final Attempt

[junliang123@grace5 scripts]$ ./create_newcase --case $SCRATCH/cesm_cases/BHIST_v2.2_final_test --compset BHIST --res f19_g17 --machine grace
Compset longname is HIST_CAM60_CLM50%BGC-CROP_CICE_POP2%ECO_MOSART_CISM2%NOEVOLVE_WW3_BGC%BDRD... (rest of successful create_newcase output) ...Creating Case directory /scratch/user/junliang123/cesm_cases/BHIST_v2.2_final_test

[junliang123@grace5 scripts]$ cd $SCRATCH/cesm_cases/BHIST_v2.2_final_test

[junliang123@grace5 BHIST_v2.2_final_test]$ ./xmlchange PROJECT=**********

[junliang123@grace5 BHIST_v2.2_final_test]$ load_cesm_envLoading PROVEN CESM environment...✅ Proven environment loaded.Currently Loaded Modules:
1) GCCcore/11.3.0 5) numactl/2.0.14 9) bzip2/1.0.8 13) SQLite/3.38.3 17) OpenSSL/1.1 21) CMake/3.24.3 25) zstd/1.5.2 29) PnetCDF/1.12.3 2) zlib/1.2.12 6) UCX/1.12.1 10) ncurses/6.3 14) XZ/5.2.5 18) Python/3.10.4 22) Szip/2.1.1 26) libxml2/2.9.13 30) HDF5/1.12.2 3) binutils/2.38 7) impi/2021.6.0 11) libreadline/8.1.2 15) GMP/6.2.1 19) cURL/7.83.0 23) gzip/1.12 27) netCDF/4.9.0 4) intel-compilers/2022.1.0 8) imkl/2022.1.0 12) Tcl/8.6.12 16) libffi/3.4.2 20) libarchive/3.6.1 24) lz4/1.9.3 28) netCDF-Fortran/4.6.0

[junliang123@grace5 BHIST_v2.2_final_test]$ ./case.setupERROR: module command None purge failed with message:/bin/sh: None: command not found

My question is:

As detailed above, case.setup fails to initialize the module system. Given that create_newcase works correctly (reading my custom XML files) and the error persists even after manually loading the identical module environment, it strongly suggests a deep toolchain incompatibility. The CIME Python scripts appear unable to correctly fork a sub-shell that inherits the necessary environment to find and execute the module command on the TAMU Grace cluster.

Any suggestions or insights into this behavior would be greatly appreciated.

Thank you so much!
 

jedwards

CSEG and Liaisons
Staff member
There is a port to grace for cesm3 here. I think that the key change you need is the allow_error modifier in the module_system line:

<module_system type="module" allow_error="true">
 
Top