Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Case.setup error when porting CESM2 to a new machine

kezhoulumelody

Kezhou Lu
New Member
Hi @jedwards,

I am porting cesm2_1_1 to our university machine. The job scheduler is slurm.

We added the following block to the config_machine.xml:

<machine MACH="pace-hive">
<DESC>Georgia Tech PACE cluster, Linux RHEL7</DESC>
<NODENAME_REGEX>.*.pace.gatech.edu</NODENAME_REGEX>
<OS>LINUX</OS>
<COMPILERS>gcc</COMPILERS>
<MPILIBS>mvapich2</MPILIBS>
<CIME_OUTPUT_ROOT>/scratch/CESM</CIME_OUTPUT_ROOT>
<DIN_LOC_ROOT>/xxxx/.../xxxx/scratch/CESM_INPUTS/2.2.0</DIN_LOC_ROOT>
<DIN_LOC_ROOT_CLMFORC>/xxxx/.../xxxx/scratch/CESM_INPUTS/2.2.0/lmwg</DIN_LOC_ROOT_CLMFORC>
<DOUT_S_ROOT>/scratch/CESM/archive/$CASE</DOUT_S_ROOT>
<BASELINE_ROOT>/xxxx/.../xxxx/scratch/CESM_INPUTS/2.2.0/ccsm_baselines</BASELINE_ROOT>
<CCSM_CPRNC></CCSM_CPRNC>
<GMAKE_J>4</GMAKE_J>
<BATCH_SYSTEM>slurm</BATCH_SYSTEM>
<SUPPORTED_BY>pace-support - at - oit.gatech.edu</SUPPORTED_BY>
<MAX_TASKS_PER_NODE>24</MAX_TASKS_PER_NODE>
<MAX_MPITASKS_PER_NODE>24</MAX_MPITASKS_PER_NODE>
<PROJECT_REQUIRED>FALSE</PROJECT_REQUIRED>
<mpirun mpilib="mvapich2">
<executable>mpirun</executable>
<arguments>
<arg name="mpi"></arg>
<arg name="num_tasks">-n {{ total_tasks }}</arg>
</arguments>
</mpirun>
<module_system type="module">
<init_path lang="perl">/usr/local/pace-apps/lmod/lmod/init/perl</init_path>
<init_path lang="python">/usr/local/pace-apps/lmod/lmod/init/env_modules_python.py</init_path>
<init_path lang="csh">/usr/local/pace-apps/lmod/lmod/init/csh</init_path>
<init_path lang="sh">/usr/local/pace-apps/lmod/lmod/init/sh</init_path>
<cmd_path lang="perl">/usr/local/pace-apps/lmod/lmod/libexec/lmod perl</cmd_path>
<cmd_path lang="python">/usr/local/pace-apps/lmod/lmod/libexec/lmod python</cmd_path>
<cmd_path lang="sh">module</cmd_path>
<cmd_path lang="csh">module</cmd_path>
<modules>
<command name="purge"/>
</modules>
<modules>
<command name="load">perl/5.34.1</command>
</modules>
<modules compiler="gcc">
<command name="load">gcc/10.3.0</command>
<command name="load">mvapich2/2.3.6</command>
<command name="load">hdf5/1.10.8</command>
<command name="load">netcdf-c/4.8.1</command>
<command name="load">mkl/20.0.4</command>
</modules>
<modules mpilib="mvapich2">
<command name="load">mvapich2/2.3.6</command>
</modules>
</module_system>
<environment_variables>
<env name="OMP_STACKSIZE">64M</env>
</environment_variables>
<environment_variables compiler="gcc">
<env name="NETCDF_PATH">/usr/local/pace-apps/spack/packages/linux-rhel7-x86_64/gcc-10.3.0/netcdf-c-4.8.1-qbpmsrxilalurws7acutvesy4h5yyzxy</env>
<env name="HDF5">/usr/local/pace-apps/spack/packages/linux-rhel7-x86_64/gcc-10.3.0/hdf5-1.10.8-jzwozkvmnzdcjeqp2gmf6hpwi5jqz7if</env>
</environment_variables>
</machine>

and the following block in the config_batch.xml:

<batch_system MACH="pace-hive" type="slurm">
<batch_submit>sbatch</batch_submit>
<directives queue="hive">
<directive default="/bin/bash" > -S {{ shell }} </directive>
<directive> --partition=hive</directive>
<directive> --account={{ project }}</directive>
<directive> --nodes={{ num_nodes }}</directive>
<directive> --ntasks-per-node={{ tasks_per_node }}</directive>
</directives>
<queues>
<queue walltimemax="12:00:00" nodemin="1" nodemax="24" default="true">hive</queue>
</queues>
</batch_system>

After we successfully created the case by using command
./create_newcase --case test-hive --res f09_g17 --compset B1850 --mach pace-hive --walltime 02:00:00 -q hive --project XXX

The case set up failed with the following messages:

###################
Creating batch scripts
Writing case.run script from input template /xxx/cesm2_2_0/cime/config/cesm/machines/template.case.run
Traceback (most recent call last):
File "/xxx/cesm2_2_0/cime/scripts/test1/./case.setup", line 67, in <module>
_main_func(__doc__)
File "/xxx/cesm2_2_0/cime/scripts/test1/./case.setup", line 64, in _main_func
case.case_setup(clean=clean, test_mode=test_mode, reset=reset, keep=keep)
File "/xxx/cesm2_2_0/cime/scripts/Tools/../../scripts/lib/CIME/case/case_setup.py", line 270, in case_setup
run_and_log_case_status(functor, phase, caseroot=caseroot)
File "/xxx/cesm2_2_0/cime/scripts/Tools/../../scripts/lib/CIME/utils.py", line 1768, in run_and_log_case_status
rv = func()
File "/xxx/cesm2_2_0/cime/scripts/Tools/../../scripts/lib/CIME/case/case_setup.py", line 254, in <lambda>
functor = lambda: _case_setup_impl(self, caseroot, clean=clean, test_mode=test_mode, reset=reset, keep=keep)
File "/xxx/cesm2_2_0/cime/scripts/Tools/../../scripts/lib/CIME/case/case_setup.py", line 203, in _case_setup_impl
env_batch.make_all_batch_files(case)
File "/xxx/cesm2_2_0/cime/scripts/Tools/../../scripts/lib/CIME/XML/env_batch.py", line 940, in make_all_batch_files
self.make_batch_script(input_batch_script, job, case)
File "/xxx/cesm2_2_0/cime/scripts/Tools/../../scripts/lib/CIME/XML/env_batch.py", line 194, in make_batch_script
overrides = self.get_job_overrides(job, case)
File "/xxx/cesm2_2_0/cime/scripts/Tools/../../scripts/lib/CIME/XML/env_batch.py", line 189, in get_job_overrides
overrides["mpirun"] = case.get_mpirun_cmd(job=job, overrides=overrides)
File "/xxx/cesm2_2_0/cime/scripts/Tools/../../scripts/lib/CIME/case/case.py", line 1437, in get_mpirun_cmd
executable, mpi_arg_list, custom_run_exe, custom_run_misc_suffix = env_mach_specific.get_mpirun(self, mpi_attribs, job)
File "/xxx/cesm2_2_0/cime/scripts/Tools/../../scripts/lib/CIME/XML/env_mach_specific.py", line 504, in get_mpirun
arg_value = transform_vars(self.text(arg_node),
File "/xxx/cesm2_2_0/cime/scripts/Tools/../../scripts/lib/CIME/utils.py", line 1509, in transform_vars
while directive_re.search(text):
TypeError: expected string or bytes-like object
######################################

It seems like the node information is not read in correctly but I don't know how to modify the config_batch.xml file.

Thanks,

Melody
 

jedwards

CSEG and Liaisons
Staff member
In config_batch.xml right after the <batch_submit> line try adding the following:
<submit_args>
<arg flag="--time" name="$JOB_WALLCLOCK_TIME"/>
</submit_args>
 

kezhoulumelody

Kezhou Lu
New Member
In config_batch.xml right after the <batch_submit> line try adding the following:
<submit_args>
<arg flag="--time" name="$JOB_WALLCLOCK_TIME"/>
</submit_args>
Hi Jedwards,

It doesn't work. I am wondering if I should write the queue information with the "arg flag" format, something like

<submit_args>
<arg flag="--time" name="$JOB_WALLCLOCK_TIME"/>
<arg flag="-q" name="$JOB_QUEUE"/>
<arg flag="--account" name="$PROJECT"/>
</submit_args>
 
Top