Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Several errors when running scripts_regression_test.py

cdevaneprugh

Cooper
New Member
Hello, I am porting cesm 2.1.5 to my university's HPC. I am using slurm, lmod, and the gnu compiler (although I could switch to an intel compiler if necessary). Things seem to be failing when it tries to build a case and create a shared library. I'm not sure if this is an issue with my compiler flags, a netcdf issue, or something else entirely. Any advice or guidance would be greatly appreciated. I am also in touch with my university's HPC support staff, and can get system info from them.

Thanks!
 

Attachments

  • config_batch_xml.txt
    1 KB · Views: 3
  • config_compilers_xml.txt
    631 bytes · Views: 4
  • config_machines_xml.txt
    2.8 KB · Views: 4
  • terminal_output.txt
    101.2 KB · Views: 4

jedwards

CSEG and Liaisons
Staff member
You are not able to build the pio library but you are not providing a pio bld log so I cannot determine the problem. Before running scripts_regression_tests again try running a single test from the cime/scripts directory:
./create_test SMS.f19_g17.A
 

cdevaneprugh

Cooper
New Member
Here is the terminal output form running that test as well as the pio bld log.

Thanks!
 

Attachments

  • f19_g17_output.txt
    4.3 KB · Views: 0
  • pio.bldlog_240313-124307.txt
    23.4 KB · Views: 1

jedwards

CSEG and Liaisons
Staff member
> Fatal Error: Cannot read module file ‘/apps/netcdf/4.7.2/include/netcdf.mod’ opened at (1), because it was created by a different version of GNU Fortran

You need to use the same version of the compiler to produce the netcdf.mod as is used for the cesm build - gnu is particularly picky about this.
 

cdevaneprugh

Cooper
New Member
For some reason I thought the "-I" flag I included in the LDFLAGS of config_compilers would ensure netcdf is getting built with my current compiler. We have several versions of netcdf netcdf-c and netcdf-f on our HPC. I'll reach out to my support staff and see what they recommend.

In the mean time I will load a different environment and be more specific with the compiler and netcdf version and see how far that gets me.

Thanks!
 

cdevaneprugh

Cooper
New Member
Additionally, I know on newer versions of the gcc compiler, "-fallow-argument-mismatch -fallow-invalid-boz" should be appended to the FFLAGS. Is that necessary if I go back to gcc v 8.2?
 

cdevaneprugh

Cooper
New Member
I had to skip using an older gcc compiler due to some current issues with our system. I ended up configuring the intel compilers and got what appears to be an almost complete port. I'm only getting 3 errors now. One at Q_TestBlessTestResults, T_TestRunRestart, and Z_FullSystemTest. After searching around on the forums a bit, it looks like one of these could be an issue with cprnc not being built/found.

If I look at the contents of TESTRUNDIFF_P1.f19_g16_rx1.A.hipergator_intel.C.20240315_082158.cpl.hi.0.nc.cprnc.out it says that there is no cprnc tool found. Do I need to clone this repo GitHub - ESMCI/cprnc: Fortran Utility to compare netcdf files. and build cprnc myself or should CIME be taking care of that?

Here is the terminal output from the regression tests along with my current config files. Any other input to resolve the last few errors is appreciated.

Thanks!
 

Attachments

  • term_output.txt
    47.9 KB · Views: 0
  • config_machines.txt
    2.6 KB · Views: 2
  • config_compilers.txt
    2.2 KB · Views: 4
  • config_batch.txt
    1.1 KB · Views: 1

jedwards

CSEG and Liaisons
Staff member
You just need to go to the cprnc directory in cime and build it following the instructions there, install it and add the install location to your config_machines.xml file. This could explain all of the remaining fails.
 

cdevaneprugh

Cooper
New Member
Strangely enough I don't see a cprnc directory. I was checking in $CIMEROOT/tools and it's not there. I also used grep to search recursively throughout the $CIMEROOT directory and nothing turns up except for mentions in config files and the Externals_cime.cfg file.

Should I try re-cloning the cesm repo and check out the externals again?
 

jedwards

CSEG and Liaisons
Staff member
Hmm, I don't see it either. Yes, you can download it from the repo you linked above and install it from there.
 

cdevaneprugh

Cooper
New Member
Got cprnc installed and cleared up most of everything. the only thing left is in the full system test where it's showing:

Test 'ERI.f09_g16.X.hipergator_intel' finished with status 'FAIL'
 

Attachments

  • term_out.txt
    4.3 KB · Views: 0

cdevaneprugh

Cooper
New Member
I'll add that I'm not sure where to look for error logs for this. There is a directory for ERI.f09_g17.B1850 in my CIME output root. Within scripts_regression_test.20240315_162229 I only see TESTBUILDFAIL and TESTRUNFAIL for f19+g16_rx1 and I'm not sure if that's what I need.

Thanks
 

jedwards

CSEG and Liaisons
Staff member
You can run this test in a stand alone manner:
cd cesm/cime/scripts
./create_test ERI.f09_g16.X.hipergator_intel
 

cdevaneprugh

Cooper
New Member
Weird. It looks like the case was built, passed to the scheduler, and ran successfully. It passed everything in the TestStatus file as well as every other log I looked at.

Any idea what could make the test fail in one circumstance but not another?

The only thing I can think of is that our research group shares a pool of resources on the university's HPC. I think on most machines CESM gets ported to, use dedicated nodes. Is it possible someone else using resources from our QOS could cause a test to fail?
 

jedwards

CSEG and Liaisons
Staff member
If it passes on the standalone run I think that you can move on to the next step - which should be to run the ECT test and
confirm your port.
 

cdevaneprugh

Cooper
New Member
Is it worth running the cheyenne "prealpha" tests as described in the CIME documentation first or is that somewhat outdated?
 

cdevaneprugh

Cooper
New Member
I am following the instructions here. I tried to create three runs for the UF-CAM-ECT test with the following command:

$ python ensemble.py --case /blue/gerber/cdevaneprugh/earth_model_output/cime_output_root/ect_runs/case.cesm_
tag.uf.000 --ect cam --uf --mach hipergator --compset F2000climo --res f19_f19_mg17 --compiler intel

But fails on creating the first case.

This leads me to a few questions regarding the ensemble tests. Is there something wrong with my input? Do I need to update the ensemble test script? Should I follow the instructions in the README in /$CIMEROOT/tools/statistical_ensemble_test instead of the cesm website?
 

Attachments

  • Screenshot_20240321_152544.png
    Screenshot_20240321_152544.png
    148.7 KB · Views: 2

jedwards

CSEG and Liaisons
Staff member
Can you create a conda environment or otherwise use an older version of python? I think 3.11 is too new for that script.
 
Top