Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

porting ctsm5.3.009 on our system

xgao304

Member
Dear Sir:

I tried to port the ctsm 5.3.009 on our system (before I successfully ported the cesm 2.1.3), but get a bit lost.

For porting the cesm2.1.3, I modified three files under ~/cime/config/cesm/machines by adding the relevant system specification:
1. config_batch.xml

<batch_system MACH="svante" type="slurm">
<batch_submit>sbatch</batch_submit>
<submit_args>
<arg flag="--time" name="$JOB_WALLCLOCK_TIME"/>
</submit_args>
<directives>
<directive> --partition=edr</directive>
<directive> --mem=0</directive>
</directives>
<queues>
<queue walltimemax="24:00:00" nodemin="1" nodemax="128" default="true">edr</queue>
</queues>
</batch_system>

2. config_compilers.xml
<compiler MACH="svante" COMPILER="intel">
<NETCDF_PATH> $(NETCDF)</NETCDF_PATH>
<SLIBS>
<append> -L${NETCDF_PATH}/lib -lnetcdf -lnetcdff </append>
</SLIBS>
<MPI_LIB_NAME>mpi</MPI_LIB_NAME>
<MPI_PATH> $(INC_MPI)/..</MPI_PATH>
</compiler>

3. config_machines.xml
... a lot of system specifications.

for ctsm5.3.009, under the directory of ccs_config/machines,

I only changed "config_batch.xml" following the same way as I did in cesm2.1.3. There is no corresponding "config_compilers.xml". "config_machines.xml" has very different contents (only machine names) from what is in cesm2.1.3 ( a lot of system specifications).
The READM file only says "Please refer to the documentation in the config_machines.xml and config_compilers.xml files.". Not sure what is the documentation referred to?

My questions are:
1) in order to port ctsm5.3.009 (not with any other cesm components), what files should I modify for porting?
2) once I change those files, could I do the similar steps as building cesm:

create_newcase
case.setup
case.build
case.submit

Or I should follow the link below to build ctsm (include build various prerequisites)
3.2.1. Obtaining and building CTSM and LILAC — ctsm CTSM master documentation

Thanks,

Xiang
 

jedwards

CSEG and Liaisons
Staff member
What was in config_compilers.xml is now in cmake macros and contents are divided into subdirectories based on machine name.
So add your machine name to the top level config_machines.xml and create a subdirectory of that same name. That subdirectory will
contain the machine specific contents of config_machines.xml. It looks like the tag you are using ctsm 5.3.009 has the first iteration of this change in
ccs_config, in future versions you will see the cmake_macros and config_batch files also seperated according to machine name. I hope that helps.
 

xgao304

Member
Dear Sir:

I have followed your instructions and did what you suggested. When I try to create a new case using the following command, I got the same error message with both intel and gcc compilers (see the attached file). I can successfully run the same command using the cesm2.1.3 and intel compiler. I also attached my config_machines.xml and config_batch.xml (under machine name "svante").

module load python/3.9.1
/net/fs12/d2/xgao/ctsm5.3/CTSM/cime/scripts/create_newcase --case testctsm \
--compset 2000_DATM%GSWP3v1_CLM50%BGC-CROP_SICE_SOCN_SROF_SGLC_SWAV \
--res CLM_USRDAT --user-mods-dir $MYDATA_DIR --machine svante --compiler gcc --run-unsupported

I discussed with our system administrator, but he is not very certain about the correct procedure to solve the issue. But he proposed the following steps to take (below ----), which seems to me quite complicated. I am wondering if there is a simpler way like cesm2.1.3 - once you set up all the required configurations, then it is ready to go. Could you provide some feedbacks?

--------
I think we have to actually build/compile it first. Based on the docs ( https://escomp.github.io/ctsm-docs/versions/master/html/lilac/obtaining-building-and-running/obtaining-and-building-ctsm.html) is
looks like that is maybe done with a process like this:

1. I think we do this from the head-node so if building for HDR, be on
svante.mit.edu
2. Load all of the modules I have specified in the machines_config.xml
for Svante
3. cd into `/net/fs12/d2/xgao/ctsm5.3/CTSM`
4. Run the build command which I think would be something like
`./lilac/build_ctsm /net/fs12/d2/$USER/ctsm_build_hdr --os linux
--machine svante --compiler gcc --netcdf-path
'/home/software/rhel/8/gcc/11.3.0/pkg/netcdf/4_shared_libs/'
--pnetcdf-path
'/home/software/rhel/8/gcc/11.3.0/pkg/netcdf/4_shared_libs/'
--esmf-mkfile-path '$ENV{ESMFMKFILE}' --max-mpitasks-per-node 48`

Ultimately, that should build/compile CTSM into
`/net/fs12/d2/xgao/ctsm_build_hdr`.

I have a feeling it might error on pnetcdf, if so I can go and build
that separately. What is confusing is there is pnetcdf and also just
netcdf 4 with parallel support. I know I have the latter built in that
netcdf/4_shared_libs module but they might specifically need pnetcdf.

Very possible we are missing something else about the config_machines
setup.
---------

Thanks,

Xiang
 

Attachments

  • config_batch.xml.txt
    27.5 KB · Views: 1
  • config_machines.xml.txt
    2.8 KB · Views: 3
  • error.txt
    2.5 KB · Views: 5

jedwards

CSEG and Liaisons
Staff member
Did not find an alias or longname compset match for 2000_DATM%GSWP3v1_CLM50%BGC-CROP_SICE_SOCN_SROF_SGLC_SWAV

I'm not sure what the problem is here, but I found that
./create_newcase --case foo --compset I2000Clm50BgcCrop --res CLM_USRDAT --run-unsupported
Compset longname is 2000_DATM%GSWP3v1_CLM50%BGC-CROP_SICE_SOCN_MOSART_SGLC_SWAV

I'm not sure why you are getting the error you are seeing. Have you put your machine name in the REGEX list in the ccs_config/machines/config_machines.xml - you shouldn't need to but I suppose it could be the problem.
 

xgao304

Member
@jedwards:

Thanks for the reply.

1). The compset works for "cesm2.1.3" as shown below (the case has been successfully run). Does cesm2.1.3 and ctsm5.3 have slightly different compsets? Also, I am running the simulations over the region (not globe). I have an impression that river routing should be off for the regional case - could that be the reason why there is a difference between "SROF" and "MOSART"?

------------
024-11-01 11:29:27: /net/fs05/d1/xgao/cesm2.1.3/cesm/cime/scripts/create_newcase --case /net/fs05/d1/xgao/cesm2.1.3/cases/BgcCrop2000_BANG_HisD --compset 2000_DATM%GSWP3v1_CLM50%BGC-CROP_SICE_SOCN_SROF_SGLC_SWAV --res CLM_USRDAT --user-mods-dir /net/fs05/d1/xgao/cesm2.1.3/cases/sim_setup/Bangladesh/domain_HisMRCM --machine svante --compiler intel --run-unsupported

2024-11-01 11:29:27: Compset longname is 2000_DATM%GSWP3v1_CLM50%BGC-CROP_SICE_SOCN_SROF_SGLC_SWAV
--------

2). I did try using your compset name "2000_DATM%GSWP3v1_CLM50%BGC-CROP_SICE_SOCN_MOSART_SGLC_SWAV", but still get the error message (see the attachment).

3. I did put my machine name in the ccs_config/machines/config_machines.xml as below:

<value MACH="svante">*.ib</value>
Is that format correct?

Thanks.

Xiang
 

Attachments

  • error1.txt
    3.1 KB · Views: 1

jedwards

CSEG and Liaisons
Staff member
I think that you need
<value MACH="svante">.*.ib</value>
or
<value MACH="svante">.*ib</value>

And you might want to make that more specific so that you can submit it back as a PR to the ccs_config repository.
 

xgao304

Member
@jedwards,

I made some progress in porting ctsm to our machine, but still have some issues. I tested two cases:

1. create_newcase --case testctsm --compset I2000Clm50BgcCrop \
--res CLM_USRDAT --user-mods-dir $MYDATA_DIR --machine svante --compiler gnu \
--run-unsupported

I got the following output (I did change "xmlchange_cmnds" in $MYDATA_DIR to "shell_commands"
View attachment 6097
The ATM_DOMAIN_FILE is stored in the specified directory as shown below:
Screenshot 2024-11-20 at 8.01.47 PM.png

I am not sure what is going on?


2. I tested the supported case with supported resolution,

create_newcase --case testctsm --compset I2000Clm50BgcCrop --res f09_g16_gl4 --machine svante --compiler gnu \
--run-unsupported (in the user guide
1.1.8. Quickstart — ctsm CTSM master documentation), it shows "no alias f09_g16_gl4". I also tried the "hcru_hcru" supported by cesm2, got the same error message. I am wondering how I can
find the supported resolution in ctsm?

/net/fs12/d2/xgao/ctsm5.3/cases/sim_setup/Bangladesh
Compset longname is 2000_DATM%GSWP3v1_CLM50%BGC-CROP_SICE_SOCN_MOSART_SGLC_SWAV
Compset specification file is /net/fs12/d2/xgao/ctsm5.3/CTSM/cime_config/config_compsets.xml
Automatically adding SESP to compset
ERROR: no alias f09_g16_gl4 defined
-----

Thanks,

Xiang
 

xgao304

Member
@jedwards:

We (our system administrator and I) are a bit confused about what is the correct procedure to follow in order to port and build ctsm on our system. We found two different documents online:

1. 3.2.1. Obtaining and building CTSM and LILAC — ctsm CTSM master documentation
This doc indicates the involvement of ./lilac/build_ctsm to build CTSM and its dependencies.

build_ctsm does a ./create_newcase command, case.setup, and then just a SHAREDLIB_BUILD (partial case.build I think) which completes without error in our system. However, the doc seems to indicate this is not a stand-alone model, needs an atmosphere - this conflicts with my preconception. Could you confirm?

2. 1.1.8. Quickstart — ctsm CTSM master documentation

This doc follows the same procedure as I did for cesm porting and running, but I am stuck with an error message posted in the previous thread which needs
your suggestions on how to solve the issue as well.

We don't know exactly which document we should follow. If you could provide us with a full set of directions to build and compile properly and how to run (or what document we should follow) before we could debug further, that would be a great help. Otherwise, we are at a halt.

Thanks,

Xiang
 
Top