Porting CTSM, and general CIME questions

cdevaneprugh · Oct 15, 2024

Hello, I am trying to port CTSM to our HPC. I successfully ported CESM 2.1.5 and we have been using it regularly. I thought because I already set up the cime config files for cesm, I would be able to git clone ctsm and start running cases. However, this doesn't seem to be the case and has led to some confusion on how different versions of cime are ported.

For CESM 2.1.5 I had the batch, compiler, and machine xml files in ~/.cime which work great. Additionally, the default configs were in $CIMEROOT/config/$model/machines. However for CTSM, it looks like the default configs are in $CTSMROOT/ccs_config/machines. They also have the nodename regex in a separate config_machines.xml file from the config for the specific machine. Additionally, it looks like there are cmake macro files rather than a single config_compilers xml file.

I'm just sort of generally confused as to what to do here. Can I git checkout another version of CIME (like maint-5.6) that will work with my existing configs? Do I need to write cmake macro files instead of using my existing config_compiler.xml file? Our goal is to use CTSM v5.2 or newer for the single point restart capability.

What I have tried so far is I added my nodename regex to the default config_machines file, and removed it from the one in my home folder so that it would pass the xml lint. However when I run ./scripts_regression_tests.py it gets stuck on the first test (test_sys_bless_tests_results) and nothing happens.

Any help or general advice would be appreciated. Thanks.

jedwards · Oct 15, 2024

The ctsm version compatible with cesm 2.1.5 is already available to you as part of the cesm source that you've already checked out.
See cesm/components/clm

If you want to use ctsm v5.2 you will need to clone that model version separately from cesm 2.1.5 and follow the instructions provided with it to
get it's supporting files. As you have noted there are significant differences in porting this version from those used for v2.1.5 of cesm.

cdevaneprugh · Oct 15, 2024

I have already cloned ctsm5.2.0 and have been working with it, but honestly the documentation is not particularly helpful. This still seems to be a cime porting issue as ./scripts_regression_tests gets stuck on the first test. Because the cime versions are configured differently, should my strategy be to append my configs to the default ones for each model in order to switch between them as needed?

jedwards · Oct 15, 2024

If you want to run two different versions of the model and you want to use .cime to keep your local port you will need to switch between two different .cime directories. The better option, as you suggest, is to append your configs to the defaults for each model.

cdevaneprugh · Oct 29, 2024

So I figured out that scripts_regression_tests.py wouldn't run due to an SBATCH option getting set in .case.run files. The $SBATCH --exclusive option gets automatically set anytime a case (or test) gets set up. Our system does not allow this, and the case will get stuck in the queue for ever. Is there an xml setting or a line I can add in my config_batch.xml file to prevent this. I've tried things like $SBATCH --exclusive=FALSE with no luck.

jedwards · Oct 29, 2024

It looks like in ctsm version 5.2.0 this is set in the general slurm options in ccs_config/machines/config_batch.xml line 180
remove this line and try again.

cdevaneprugh · Nov 21, 2024

Upon some other advice we have decided to switch to porting ctsm5.3.012. Most of the scripts_regression_tests run fine, but I am still getting some fails and errors. The tests we are failing are TestCreateNewcase, TestFullSystem, TestRunRestart, and TestUserConcurrentMods. We are getting an error for TestParamGenYamlConstructor, TestUnitSystemTests, and TestUnitXMLMachines.

For the yaml tests, is this a required program? As it didn't seem to be necessary for cesm2.1.5 to run successfully.
Also, the TestUnitXMLMachines seem to be the strangest error to me, as it is throwing errors saying we don't have a valid batch system, mpilib, or compiler, however these configs are essentially the same as our cesm ones which work fine.

I've attached the terminal output from the tests, as well as my config files here. Any other advice is appreciated to eliminate these errors, and find what is causing them.

Porting CTSM, and general CIME questions

cdevaneprugh

Cooper

Member

jedwards

CSEG and Liaisons

cdevaneprugh

Cooper

Member

jedwards

CSEG and Liaisons

cdevaneprugh

Cooper

Member

jedwards

CSEG and Liaisons

cdevaneprugh

Cooper

Member

Attachments