Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

trying CLM5 on another cluster

wvsi3w

wvsi3w
Member
I have used CLM5 for my quick run and did all the configuration for it in Narval (one of computecanada clusters) and it did work. I wanted to try it on another system which is called Beluga but when I try the create new case step it ends up with some error that I will write it below. But first this is the link to the files and config of my previous run on Narval (CTSM - CLM5 - CESM2.1.3 - results of a quick run) that I discussed earlier. I have changed the configuration for Beluga system (input, output, etc...) but when I do the "./create_newcase --case /home/meisam/scratch/cases/mayBeluga --compset I1850Clm50Bgc --res f19_g16 --machine narval --walltime 02:00:00 --run-unsupported" it show the following:

here is the error:

Compset longname is HIST_DATM%GSWP3v1_CLM50%BGC-CROP_SICE_SOCN_MOSART_CISM2%NOEVOLVE_SWAV
Compset specification file is /home/meisam/my_cesm_sandbox/cime/../components/clm//cime_config/config_compsets.xml
Compset forcing is Historic transient
ATM component is Data driven ATM GSWP3v1 data set LND component is clm5.0:BGC (vert. resol. CN and methane) with prognostic crop:
ICE component is Stub ice component
OCN component is Stub ocn component
ROF component is MOSART: MOdel for Scale Adaptive River Transport GLC component is cism2 (default, higher-order, can run in parallel):cism ice evolution turned off (this is the standard configuration unless you're explicitly interested in ice evolution):
WAV component is Stub wave component
ESP component is
Pes specification file is /home/meisam/my_cesm_sandbox/cime/../components/clm//cime_config/config_pes.xml
Compset specific settings: name is RUN_STARTDATE and value is 1850-01-01 Traceback (most recent call last):
File "./create_newcase", line 218, in <module>
_main_func(__doc__)
File "./create_newcase", line 213, in _main_func
input_dir=input_dir, driver=driver, workflowid=workflow)
File "/home/meisam/my_cesm_sandbox/cime/scripts/Tools/../../scripts/lib/CIME/case/case.py", line 1448, in create
input_dir=input_dir, driver=driver, workflowid=workflowid)
File "/home/meisam/my_cesm_sandbox/cime/scripts/Tools/../../scripts/lib/CIME/case/case.py", line 814, in configure
probed_machine = machobj.probe_machine_name()
File "/home/meisam/my_cesm_sandbox/cime/scripts/Tools/../../scripts/lib/CIME/XML/machines.py", line 112, in probe_machine_name
machine = self._probe_machine_name_one_guess(nametomatch)
File "/home/meisam/my_cesm_sandbox/cime/scripts/Tools/../../scripts/lib/CIME/XML/machines.py", line 147, in _probe_machine_name_one_guess
regex = re.compile(regex_str)
File "/cvmfs/soft.computecanada.ca/gentoo/2020/usr/lib/python3.7/re.py", line 234, in compile
return _compile(pattern, flags)
File "/cvmfs/soft.computecanada.ca/gentoo/2020/usr/lib/python3.7/re.py", line 286, in _compile
p = sre_compile.compile(pattern, flags)
File "/cvmfs/soft.computecanada.ca/gentoo/2020/usr/lib/python3.7/sre_compile.py", line 764, in compile
p = sre_parse.parse(p, flags)
File "/cvmfs/soft.computecanada.ca/gentoo/2020/usr/lib/python3.7/sre_parse.py", line 924, in parse
p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0)
File "/cvmfs/soft.computecanada.ca/gentoo/2020/usr/lib/python3.7/sre_parse.py", line 420, in _parse_sub
not nested and not items))
File "/cvmfs/soft.computecanada.ca/gentoo/2020/usr/lib/python3.7/sre_parse.py", line 645, in _parse
source.tell() - here + len(this))
re.error: nothing to repeat at position 0
 

oleson

Keith Oleson
CSEG and Liaisons
Staff member
I don't understand this error so I'm moving this to the porting forum.
 

erik

Erik Kluzek
CSEG and Liaisons
Staff member
It's dying in the python in an evaluation of a regular expression. So the first thing I wonder about is what version of python you are using?

Run...

python3 --version

We use python3.7.9 on cheyenne which is known to work.
 

jedwards

CSEG and Liaisons
Staff member
Try updating the cime component to the latest maint-5.6 tag created this morning and compatible with python 3.10:
cime5.6.44
 

wvsi3w

wvsi3w
Member
It's dying in the python in an evaluation of a regular expression. So the first thing I wonder about is what version of python you are using?

Run...

python3 --version

We use python3.7.9 on cheyenne which is known to work.
Hi,
It is using
Python 3.7.7
 

wvsi3w

wvsi3w
Member
Try updating the cime component to the latest maint-5.6 tag created this morning and compatible with python 3.10:
cime5.6.44
Hello,
I couldn't find the correct way to do that. I looked it up in the forum and tried some of the ways you explained it before which did not work for me:
For instance I tried this "cime/scripts/Tools/checkout_cime.py -s" which says :
-bash: cime/scripts/Tools/checkout_cime.py: No such file or directory

and this also:
git checkout maint-5.6
error: Your local changes to the following files would be overwritten by checkout:
config/cesm/config_inputdata.xml
config/cesm/machines/config_batch.xml
config/cesm/machines/config_compilers.xml
config/cesm/machines/config_machines.xml
Please commit your changes or stash them before you switch branches.
Aborting

inside my
/cime/scripts/Tools
there are these .py files:
standard_script_setup.py
testreporter.py
generate_cylc_workflow.py
__init__.py​
 

wvsi3w

wvsi3w
Member
cd cesm2.1.3/cime
git stash
git checkout maint-5.6
git pull origin maint-5.6
git stash pop
Thank you for the clarification.
I did this and even though the python version is 3.10 now, the same error happened again.
I am still seeing this:
File "/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx512/Core/python/3.10.2/lib/python3.10/sre_parse.py", line 668, in _parse
raise source.error("nothing to repeat",
re.error: nothing to repeat at position 0
 

jedwards

CSEG and Liaisons
Staff member
In config_machines.xml try removing the NODENAME_REGEX field for your system, it seems to be having trouble with that line even though you are explicitly providing a machine name.
 

wvsi3w

wvsi3w
Member
In config_machines.xml try removing the NODENAME_REGEX field for your system, it seems to be having trouble with that line even though you are explicitly providing a machine name.
I don't have that in my machine, I am using my own type of machine config which is as below:

<machine MACH="narval">
<DESC>PNL IBM Xeon cluster, os is Linux (pgi), batch system is SLURM</DESC>
<OS>LINUX</OS>
<COMPILERS>intel</COMPILERS>
<MPILIBS>openmpi</MPILIBS>
<CIME_OUTPUT_ROOT>/scratch/$USER</CIME_OUTPUT_ROOT>
<DIN_LOC_ROOT>/home/$USER/projects/def-hbeltram/meisam/inputdata</DIN_LOC_ROOT>
<DIN_LOC_ROOT_CLMFORC>/home/$USER/projects/def-hbeltram/meisam/inputdata/atm/datm7</DIN_LOC_ROOT_CLMFORC>
<DOUT_S_ROOT>/scratch/$USER/cases/$CASE</DOUT_S_ROOT>
<BASELINE_ROOT>/home/$USER/IRESM/ccsm_baselines</BASELINE_ROOT>
<CCSM_CPRNC>/home/$USER/IRESM/tools/cprnc/cprnc</CCSM_CPRNC>
<GMAKE_J>8</GMAKE_J>
<BATCH_SYSTEM>slurm</BATCH_SYSTEM>
<SUPPORTED_BY> @ pusan.ac.kr</SUPPORTED_BY>
<MAX_TASKS_PER_NODE>64</MAX_TASKS_PER_NODE>
<MAX_MPITASKS_PER_NODE>64</MAX_MPITASKS_PER_NODE>
<mpirun mpilib="openmpi">
<executable>srun</executable>
<arguments>
<arg name="num_tasks">-n {{ total_tasks }}</arg>
<arg name="tasks_per_node"> --ntasks-per-node=$MAX_MPITASKS_PER_NODE </arg>
</arguments>
</mpirun>
<module_system type="module">
<init_path lang="perl">/cvmfs/soft.computecanada.ca/custom/software/lmod/lmod/init/perl</init_path>
<init_path lang="python">/cvmfs/soft.computecanada.ca/custom/software/lmod/lmod/init/env_modules_python.py</init_path>
<init_path lang="csh">/cvmfs/soft.computecanada.ca/custom/software/lmod/lmod/init/csh</init_path>
<init_path lang="sh">/cvmfs/soft.computecanada.ca/custom/software/lmod/lmod/init/sh</init_path>
<cmd_path lang="perl">/cvmfs/soft.computecanada.ca/custom/software/lmod/lmod/libexec/lmod perl</cmd_path>
<cmd_path lang="python">/cvmfs/soft.computecanada.ca/custom/software/lmod/lmod/libexec/lmod python</cmd_path>
<cmd_path lang="sh">module</cmd_path>
<cmd_path lang="csh">module</cmd_path>
<modules>
<command name="load">perl/5.30.2</command>
<command name="load">cmake/3.23.1</command>
<command name="load">netcdf-mpi/4.7.4</command>
<command name="load">netcdf-fortran-mpi/4.6.0</command>
<command name="load">pnetcdf/1.12.2</command>
<command name="load">xml-libxml/2.0205</command>
</modules>
</module_system>
<environment_variables>
<env name="OMP_STACKSIZE">64M</env>
</environment_variables>
</machine>


I forgot to tell you that I have a perl5 directory in my previous cluster, which I don't have in my current one (Beluga). I don't remember adding that anyhow to the previous one and I think that was there already. Should I do something about that in my current cluster, and maybe that would be the case?
 

erik

Erik Kluzek
CSEG and Liaisons
Staff member
We have a suggestion for you to look into. We've just released a new version of the CESM2.1 series -- cesm2.1.4-rc.13. It might be worthwhile for you to look into using it. The science changes will be minimal, but it was recently tested on our machines and there were some small machine updates that took place that might help you out. So I recommend looking into it.
 

wvsi3w

wvsi3w
Member
Hello again. Thanks for your replies.
I used the help of the cluster's support team and below is what they suggested:

"It's important to use "--machine beluga" and not narval, as the code checks the hostname. changed the XML files as follows:
* replaced "narval" with "beluga"
* replaced 48 by 40 (cores per node)
* replaced -march=core-avx2 by -xCore-AVX512 in config_compilers.xml to match the CPU architecture."

After this, I did all four steps of running the case and it worked well and downloaded all the input data for this case BUT it failed with one srun error and I am putting that error in a new thread since it is a different topic.
 
Top