Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

porting error for CTSM 5.2

Isaactian

Isaac Tian
New Member
What version of the code are you using?
CTSM5.2.005

Have you made any changes to files in the source tree?
Yes

Describe every step you took leading up to the problem:
I was trying to port it into Niagara supercomputer (U of Toronto).
1. The config_batch file is firstly modified (please see attached file).

2. A folder named "niagara" in the machine directory is created and placed the corresponding config_machine file there.

3. Try to test a global simualtion by creating a newcase (./create_newcase --case $SCRATCH/CTSM_test/cases/case_3 --mach niagara --res f19_g17 --compset IHistClm50Bgc --compiler intel --run-unsupported), but cannot run case.build successfully since I got an error: undefined reference to symbol 'nc_set_var_chunk_cache'.
--> Resolve it by modifying the intel.cmake file in cmake_macros directory by adding the line: string(APPEND SLIBS " -lnetcdf -lnetcdff").

4. Both case.setup and case.build completed successfully. However, during case.submit, the tasks are quickly interrupted, and I can't find specific error messages in the CESM log file, just a series of "application called MPI_Abort(comm=0x84000002, 1)" information (see attached .log file).

If this is a port to a new machine: Please attach any files you added or changed for the machine port (e.g., config_compilers.xml, config_machines.xml, and config_batch.xml) and tell us the compiler version you are using on this machine.
Please attach any log files showing error messages or other useful information.

1. Compiler: intel/2022u2

2. Preview.run outputs:
CASE INFO:
nodes: 2
total tasks: 80
tasks per node: 40
thread count: 1
ngpus per node: 0

BATCH INFO:
FOR JOB: case.run
ENV:
Setting Environment ESMFMKFILE=/scinet/niagara/software/2022a/opt/intel-2022u2-intelmpi-2022u2+ucx-1.11.2/esmf/8.6.0/lib/libO/Linux.intel.64.intelmpi.default/esmf.mk
Setting Environment NETCDF_PATH=/scinet/niagara/software/2022a/opt/intel-2022u2-intelmpi-2022u2+ucx-1.11.2-hdf5-1.10.9/netcdf/4.9.0
Setting Environment OMP_NUM_THREADS=1
Setting Environment OMP_STACKSIZE=512M
Setting Environment PNETCDF_PATH=/scinet/niagara/software/2022a/opt/intel-2022u2-intelmpi-2022u2+ucx-1.11.2/pnetcdf/1.8.1
Setting Environment SCINET_NETCDF_MPI_ROOT=/scinet/niagara/software/2022a/opt/intel-2022u2-intelmpi-2022u2+ucx-1.11.2-hdf5-1.10.9/netcdf/4.9.0

SUBMIT CMD:
sbatch --time 5:00:00 --partition compute --account rrg-cgf .case.run --resubmit

MPIRUN (job=case.run):
mpirun -np 80 /scratch/c/cgf/cytian/CTSM_test/output/CIME_output/case_3/bld/cesm.exe >> cesm.log.$LID 2>&1

FOR JOB: case.st_archive
ENV:
Setting Environment ESMFMKFILE=/scinet/niagara/software/2022a/opt/intel-2022u2-intelmpi-2022u2+ucx-1.11.2/esmf/8.6.0/lib/libO/Linux.intel.64.intelmpi.default/esmf.mk
Setting Environment NETCDF_PATH=/scinet/niagara/software/2022a/opt/intel-2022u2-intelmpi-2022u2+ucx-1.11.2-hdf5-1.10.9/netcdf/4.9.0
Setting Environment OMP_NUM_THREADS=1
Setting Environment OMP_STACKSIZE=512M
Setting Environment PNETCDF_PATH=/scinet/niagara/software/2022a/opt/intel-2022u2-intelmpi-2022u2+ucx-1.11.2/pnetcdf/1.8.1
Setting Environment SCINET_NETCDF_MPI_ROOT=/scinet/niagara/software/2022a/opt/intel-2022u2-intelmpi-2022u2+ucx-1.11.2-hdf5-1.10.9/netcdf/4.9.0

SUBMIT CMD:
sbatch --time 5:00:00 --partition compute --account rrg-cgf --dependency=afterok:0 case.st_archive --resubmit

3. intel.cmake, config_machines.xml, config_batch.xml, cesm.log.13755243.241018-175013 have been attached.

Describe your problem or question:
Cannot run case.submit successfully when testing --case $SCRATCH/CTSM_test/cases/case_3 --mach niagara --res f19_g17 --compset IHistClm50Bgc --compiler intel --run-unsupported. Please kindly help me figure out what is wrong during the whole porting process.
 

Isaactian

Isaac Tian
New Member
I don't know why the uploaded files were not shown. I re-attached relevant files here. I really appreciate it if someone can help me work through it.
 

jedwards

CSEG and Liaisons
Staff member
Note that you need to add a .txt to the log file names in order to upload them - also check that there are no PET* files in the run directory.
Have you run any of the simple test provided before trying this more complicated case?
 

Isaactian

Isaac Tian
New Member
Note that you need to add a .txt to the log file names in order to upload them - also check that there are no PET* files in the run directory.
Have you run any of the simple test provided before trying this more complicated case?
Thx for your message! I have uploaded them as txt files now. I indeed found a series of PET*.ESMF_LogFile files in the CIME_output/case_3/run directory (maybe after the submitted work failed).

Since I have successfully run the same case with CESM 2.1.5, I thought it might be a good way to find out if the ported CTSM5.2 can work properly. So after I set up all these config_XX files, I directly ran this case.
 

Attachments

  • cesm.log.13755243.241018-175013.txt
    8.8 KB · Views: 1
  • config_batch.txt
    2.9 KB · Views: 0
  • config_machines.txt
    3.7 KB · Views: 0
  • config_machines_machine_niagara.txt
    3.2 KB · Views: 0
  • intel.txt
    1.8 KB · Views: 0

Isaactian

Isaac Tian
New Member
The error message is in the ESMF PET files.
Thanks! Then I found the error: ...ESMCI_mesh_create_from_file() Library needed by ESMF not present - This functionality requires ESMF to be built with the PIO library enabled. I think the esmf module in Niagara was not build with the PIO library enabled.

I'm wondering if I attempt to build ESMF with PIO enabled, should I set ESMF_PIO as "external" or "internal"? Thanks again for your help!
 
Top