Several errors when running scripts_regression_test.py

cdevaneprugh · Mar 13, 2024

Hello, I am porting cesm 2.1.5 to my university's HPC. I am using slurm, lmod, and the gnu compiler (although I could switch to an intel compiler if necessary). Things seem to be failing when it tries to build a case and create a shared library. I'm not sure if this is an issue with my compiler flags, a netcdf issue, or something else entirely. Any advice or guidance would be greatly appreciated. I am also in touch with my university's HPC support staff, and can get system info from them.

Thanks!

jedwards · Mar 13, 2024

You are not able to build the pio library but you are not providing a pio bld log so I cannot determine the problem. Before running scripts_regression_tests again try running a single test from the cime/scripts directory:
./create_test SMS.f19_g17.A

cdevaneprugh · Mar 13, 2024

Here is the terminal output form running that test as well as the pio bld log.

Thanks!

jedwards · Mar 13, 2024

> Fatal Error: Cannot read module file ‘/apps/netcdf/4.7.2/include/netcdf.mod’ opened at (1), because it was created by a different version of GNU Fortran

You need to use the same version of the compiler to produce the netcdf.mod as is used for the cesm build - gnu is particularly picky about this.

cdevaneprugh · Mar 13, 2024

For some reason I thought the "-I" flag I included in the LDFLAGS of config_compilers would ensure netcdf is getting built with my current compiler. We have several versions of netcdf netcdf-c and netcdf-f on our HPC. I'll reach out to my support staff and see what they recommend.

In the mean time I will load a different environment and be more specific with the compiler and netcdf version and see how far that gets me.

Thanks!

cdevaneprugh · Mar 13, 2024

Additionally, I know on newer versions of the gcc compiler, "-fallow-argument-mismatch -fallow-invalid-boz" should be appended to the FFLAGS. Is that necessary if I go back to gcc v 8.2?

jedwards · Mar 13, 2024

No, you do not need these flags on that version of gcc.

cdevaneprugh · Mar 15, 2024

I had to skip using an older gcc compiler due to some current issues with our system. I ended up configuring the intel compilers and got what appears to be an almost complete port. I'm only getting 3 errors now. One at Q_TestBlessTestResults, T_TestRunRestart, and Z_FullSystemTest. After searching around on the forums a bit, it looks like one of these could be an issue with cprnc not being built/found.

If I look at the contents of TESTRUNDIFF_P1.f19_g16_rx1.A.hipergator_intel.C.20240315_082158.cpl.hi.0.nc.cprnc.out it says that there is no cprnc tool found. Do I need to clone this repo GitHub - ESMCI/cprnc: Fortran Utility to compare netcdf files. and build cprnc myself or should CIME be taking care of that?

Here is the terminal output from the regression tests along with my current config files. Any other input to resolve the last few errors is appreciated.

Thanks!

jedwards · Mar 15, 2024

You just need to go to the cprnc directory in cime and build it following the instructions there, install it and add the install location to your config_machines.xml file. This could explain all of the remaining fails.

cdevaneprugh · Mar 15, 2024

Strangely enough I don't see a cprnc directory. I was checking in $CIMEROOT/tools and it's not there. I also used grep to search recursively throughout the $CIMEROOT directory and nothing turns up except for mentions in config files and the Externals_cime.cfg file.

Should I try re-cloning the cesm repo and check out the externals again?

jedwards · Mar 15, 2024

Hmm, I don't see it either. Yes, you can download it from the repo you linked above and install it from there.

cdevaneprugh · Mar 15, 2024

Got cprnc installed and cleared up most of everything. the only thing left is in the full system test where it's showing:

Test 'ERI.f09_g16.X.hipergator_intel' finished with status 'FAIL'

cdevaneprugh · Mar 15, 2024

I'll add that I'm not sure where to look for error logs for this. There is a directory for ERI.f09_g17.B1850 in my CIME output root. Within scripts_regression_test.20240315_162229 I only see TESTBUILDFAIL and TESTRUNFAIL for f19+g16_rx1 and I'm not sure if that's what I need.

Thanks

jedwards · Mar 15, 2024

You can run this test in a stand alone manner:
cd cesm/cime/scripts
./create_test ERI.f09_g16.X.hipergator_intel

cdevaneprugh · Mar 15, 2024

Weird. It looks like the case was built, passed to the scheduler, and ran successfully. It passed everything in the TestStatus file as well as every other log I looked at.

Any idea what could make the test fail in one circumstance but not another?

The only thing I can think of is that our research group shares a pool of resources on the university's HPC. I think on most machines CESM gets ported to, use dedicated nodes. Is it possible someone else using resources from our QOS could cause a test to fail?

jedwards · Mar 16, 2024

If it passes on the standalone run I think that you can move on to the next step - which should be to run the ECT test and
confirm your port.

cdevaneprugh · Mar 16, 2024

Is it worth running the cheyenne "prealpha" tests as described in the CIME documentation first or is that somewhat outdated?

jedwards · Mar 16, 2024

Sure those are worth doing, don't skip the ECT though.

cdevaneprugh · Mar 21, 2024

I am following the instructions here. I tried to create three runs for the UF-CAM-ECT test with the following command:

$ python ensemble.py --case /blue/gerber/cdevaneprugh/earth_model_output/cime_output_root/ect_runs/case.cesm_
tag.uf.000 --ect cam --uf --mach hipergator --compset F2000climo --res f19_f19_mg17 --compiler intel

But fails on creating the first case.

This leads me to a few questions regarding the ensemble tests. Is there something wrong with my input? Do I need to update the ensemble test script? Should I follow the instructions in the README in /$CIMEROOT/tools/statistical_ensemble_test instead of the cesm website?

jedwards · Mar 21, 2024

Can you create a conda environment or otherwise use an older version of python? I think 3.11 is too new for that script.

Several errors when running scripts_regression_test.py

Cooper

Member

Attachments

CSEG and Liaisons

Cooper

Member

Attachments

CSEG and Liaisons

Cooper

Member

Cooper

Member

CSEG and Liaisons

Cooper

Member

Attachments

CSEG and Liaisons

Cooper

Member

CSEG and Liaisons

Cooper

Member

Attachments

Cooper

Member

CSEG and Liaisons

Cooper

Member

CSEG and Liaisons

Cooper

Member

CSEG and Liaisons

Cooper

Member

Attachments

CSEG and Liaisons