Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

CESM2 Ensemble Verification fail

QINKONG

QINQIN KONG
Member
Hi! I ported cesm2.1.3 to my HPC cluster. It seems to work fine. But the ensemble verification fails for both POP-ECT and UF-ECT.
The output of UF-ECT says 50 pc all fails, the output of POP-ECT is as follows. By the way, there is plenty of history files being written in POP-ECT case. I only uploaded the first time step of ocn component.
Given this verification failure, what should I do to check what's wrong? Thanks!

CESM Version Tested: CESM 2.1.3
Metadata retrieved from: intel.popcase.cesm_tag.000.pop.h.0001-01.nc

**********Run 1 (file=/cesmwebverify/ensembles/20210221133156201808/intel.popcase.cesm_tag.000.pop.h.0001-01.nc):
UVEL: 99.32%
VVEL: 99.28%
TEMP: 98.84%
SALT: 1.50%
SSH: 99.73%
**********1 of 5 variables failed, resulting in an overall FAIL**********

Testing complete.
 

jedwards

CSEG and Liaisons
Staff member
This indicates a serious problem with your port... do the scripts regression tests pass? Can you try other
compiler versions?
 

QINKONG

QINQIN KONG
Member
This indicates a serious problem with your port... do the scripts regression tests pass? Can you try other
compiler versions?
Hi Jim. I just changed the version of netcdf and the UF-ECT test pass! Maybe there some incompatibility issue in my previous combinations of compilers and libraries.
 

jedwards

CSEG and Liaisons
Staff member
This may have been coincidental - our systems team found an issue in the test server and have been working on it this morning.
 

QINKONG

QINQIN KONG
Member
I think I just changed from netcdf-fortran/4.5.2 and netcdf/4.7.0 to netcdf-fortran/4.4.4 and netcdf/4.5.0 (which are the default combinations in the module system of our cluster). MPI remained to be openmpi/3.1.4 and intel compiler remained to be intel/19.0.3.199
 

QINKONG

QINQIN KONG
Member
This may have been coincidental - our systems team found an issue in the test server and have been working on it this morning.
I'm very happy it finally seems to work (although may still have a lot to do to optimize, such as I want to try impi with intel compiler instead of openmpi). I can change back to the original netcdf version to see whether I can reproduce the error.

Thanks a lot for your help!
 

nick

Herold
Member
This may have been coincidental - our systems team found an issue in the test server and have been working on it this morning.
Hi @jedwards, could I please confirm that the issue with the test server has been fixed? I'm trying to pass my CESM2.1.3 install for the POP ECT and have had no success with multiple configs. Even though it passes the CAM ECT. Also, I see there are only POP files up to CESM2.1.2 on the server, were the changes to 2.1.3 irrelevant for this test?

Thanks very much.
 

nick

Herold
Member
Ok my CESM2.1.2 POP ECT failed as well. So this could be my setup. But I also note there are two CESM2.1.2 POP files on the server. Would it be worth checking the correct one is being pointed to? Previously I know the CAM test has pointed to the wrong file on the server.
 

nick

Herold
Member
Thanks very much @jedwards. I've just uploaded a bunch of my test files and they all still fail. These test different compilers, compiler versions and CESM2.1.2 and 2.1.3. It's surprising that CAM ECT passes so easily but POP doesn't. While this could very well still be an issue with my setup, could I confirm that CESM2.1.3 should be passing on the web form. This will at least eliminate one unknown. Also, I'm running on AMD processors, I'm not sure how important this is but if anyone knows of a port to an AMD machine it would be great to see the compiler setup.

A sample of my POP test stats below (they are all similar to this, with salinity being the culprit):

CESM Version Tested: CESM 2.1.3
Metadata retrieved from: popcase.TEST11-intel17-ncdf444.cesm_tag.000.pop.h.0001-12.nc

**********Run 1 (file=/cesmwebverify/ensembles/2021022614025931105/popcase.TEST11-intel17-ncdf444.cesm_tag.000.pop.h.0001-12.nc):
UVEL: 99.41%
VVEL: 99.41%
TEMP: 99.15%
SALT: 0.25%
SSH: 99.50%
**********1 of 5 variables failed, resulting in an overall FAIL**********

Testing complete.
 

jedwards

CSEG and Liaisons
Staff member
Hi Nick - I think that the AMD system could be an issue, I don't think we have access to any at the moment, although our next NCAR HPC system will use them. When I run a 2.1.2 pop test against the 2.1.1 ensemble the results look similar to this.
 

jedwards

CSEG and Liaisons
Staff member
Nick - It turns out that we found another issue on our website, we are working on it now but it may take some time to address.
 

fischer

CSEG and Liaisons
Staff member
Nick, I was able to test you pop file from the command line, and it passed.


--------pyCECT--------

Friday, 26. February 2021 04:18PM

Ensemble summary file = /glade/p/cesmdata/inputdata/validation/pop_ensembles/cesm2.1.2/pop_ens.T62_g17.G.cesm2.1.2.cheyenne_intel.summary_c2020221.nc

Testcase file directory = /glade/scratch/fischer/


Found 1 matching files in specified --indir

Z-score tolerance = 3.00
ZPR = 90.00%
Checkpoint month(s) = [12]

**********Run 1 (file=/glade/scratch/fischer/popcase.TEST11-intel17-ncdf444.cesm_tag.000.pop.h.0001-12.nc):
UVEL: 99.41%
VVEL: 99.41%
TEMP: 99.15%
SALT: 99.24%
SSH: 99.50%
**********0 of 5 variables failed, resulting in an overall PASS**********

Testing complete.
cleardot.gif
 

nick

Herold
Member
Great. I did start down the path of running the command line test but was getting python module errors with Nio. Your test is helpful. I'll wait until the verification site is back up to test my main configuration. Thanks to both for your responsiveness.
 

Liu W

liuwei
Member
Nick - It turns out that we found another issue on our website, we are working on it now but it may take some time to address.
Hi, jedwards, have the issue been repaired? The UF_CAM_ECT succeed but the POP_ECT failed with the test for salt don't passed.
 
Top