Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

regression and create test fail

This post is similar to a previous one and is a continuation of a post in the CESM General called "too cold to be true". In that post I showed the result of a simulation made with cesm2_1_3 that in the first year of simulation the temperature near the Equator drops to 260K (see the figure).

Following the advice of Erik Kluzek I have made some test. I have begun with the regression test suit. All test in the suit have finish OK except the test_run_restart test with the following message:


FAIL: test_run_restart (__main__.T_TestRunRestart)
----------------------------------------------------------------------
Traceback (most recent call last):
File "./scripts_regression_tests.py", line 1307, in test_run_restart
self._create_test(["NODEFAIL_P1.f09_g16.X"], test_id=self._baseline_name)
File "./scripts_regression_tests.py", line 954, in _create_test
self._wait_for_tests(test_id, expect_works=(not pre_run_errors and not run_errors))
File "./scripts_regression_tests.py", line 963, in _wait_for_tests
from_dir=self._testroot, expected_stat=expected_stat)
File "./scripts_regression_tests.py", line 67, in run_cmd_assert_result
test_obj.assertEqual(stat, expected_stat, msg=msg)

AssertionError:
COMMAND: /lustre/lusitania_homes/UniversidadDeExtremadura/joseagustin.garcia/Documents/CESM/cesm2_1_3/cime/scripts/Tools/wait_for_tests *fake_testing_only_20220209_200621/TestStatus
FROM_DIR: /lusitania_homes/UniversidadDeExtremadura/joseagustin.garcia/Documents/CESM/prueba2/scripts_regression_test.20220209_194502
SHOULD HAVE WORKED, INSTEAD GOT STAT 100
OUTPUT: Test 'NODEFAIL_P1.f09_g16.X.login1_intel' finished with status 'FAIL'
Path: NODEFAIL_P1.f09_g16.X.login1_intel.fake_testing_only_20220209_200621/TestStatus

Analyzing the file TestStatus.log I get the following message:

' failed
See log file for details: /lusitania_homes/UniversidadDeExtremadura/joseagustin.garcia/Documents/CESM/prueba2/scripts_regression_test.20220209_194502/NODEFAIL_P1.f09_g16.X.login1_intel.fake_testing_only_20220209_200621/run/cesm.log.479971.220209-200953

the content of the cesm.log.479971.220209-200953 is

JGF FAKE NODE FAIL
JGF FAKE NODE FAIL
JGF FAKE NODE FAIL
JGF FAKE NODE FAIL


I have also run the create_test with --xml-category prealpha and --xml-machine cheyenne with a lots of FAILS. Find attached the output log.

I don't understand the fact that even in the first year of simulation the model goes so wong. I have ported to my computer in Spain CESM1.2 whith very goods results.

Thanks por your help and patience.

Agustin
 

Attachments

  • trefht_1980.pdf
    7.3 KB · Views: 2
  • test_out_2_1_3-v2.txt
    95.7 KB · Views: 3

jedwards

CSEG and Liaisons
Staff member
Agustin,

Tell us about your system - what is the compiler and version, the mpi library and version? You should try the
cime/tools/statistical_ensemble_test: open the README and follow the instructions for UF-CAM-ECT.
 
Find attached my config_compilers and config_machine files (only the lines correspondig to our configuration). I wil try to run the statistical_ensemble.

thanks,

Agustin
 
After a big effort (mainly with the python instalation) I have achived runnig the statistical_ensemble_test

1.) To get the UF-CAM-ECT ensamble I run the script:

nohup python ensemble.py --case ${HOME}/ENSEMBLE/prueba1_tag.000 --mach login1 --ensemble 350 --uf --ect cam --project P99999999 > prueba1_log 2>&1 &

after that, I have got a 350 members ensemble without any problem

2.) Next I run the script to get the summary file

nohup python pyEnsSum.py --indir ${HOME}/ENSEMBLE/zzjuntos/ --verbose --mach login1 --jsonfile excluded_empty.json > salida4.log 2>&1 &

I have to exclude several variables because there was some errors:

{
"ExcludedVar": ["CONCLD","RELVAR","VD01","bc_c4","pom_c4","AEROD_v","AODDUST","AODDUST1","AODDUST3","AODVIS","PHIS","PRECC","PRECSC","SFH2O2","SFH2SO4","SFSOAG","SFbc_a1","SFbc_a4","SFpom_a1","SFpom_a4","SFso4_a3","SFsoa_a1","SFsoa_a2","SNOWHICE","bc_a4_CLXF","bc_a4_CMXF","bc_c4SFWET","num_a4_CLXF","num_a4_CMXF","pom_a4_CLXF","pom_a4_CMXF","pom_c4SFWET","so4_a1_CLXF","so4_a2_CMXF","SO2_CLXF","SO2_CMXF","FSNT","num_a2_CLXF","num_a1_CLXF","num_a2_CMXF","FSNTC","FLNTC","FLUT","FSNTOA","FSUTOA","LWCF","ncl_a3SF","SFnum_a1","ncl_a2SF","SFnum_a3","Q","num_a1SF","dst_a2SF","dst_a1SF","num_a2SF","num_a3SF","TGCLDCWP","SOLIN","TSMX","SFdst_a1","SFncl_a3","dst_a3SF","SFdst_a2","ncl_a1SF","SFncl_a2","SFso4_a1","TSMN","SFSO2","SFso4_a2","SFnum_a4","SFDMS","LANDFRAC","OCNFRAC","ICEFRAC"]
}

3. Finaly I run the script

python pyCECT.py --verbose --sumfile ens.summary.nc --indir ${HOME}/ENSEMBLE/zzjuntos/ --tslice 1


I have got the following message:

--------pyCECT--------

Thursday, 24. February 2022 05:42PM

Ensemble summary file = ens.summary.nc

Testcase file directory = /lusitania_homes/UniversidadDeExtremadura/joseagustin.garcia/ENSEMBLE/zzjuntos/


Found 350 matching files in specified --indir
Randomly pick input files:
/lusitania_homes/UniversidadDeExtremadura/joseagustin.garcia/ENSEMBLE/zzjuntos/prueba1_tag.306.cam.h0.0001-01-01-00000.nc
/lusitania_homes/UniversidadDeExtremadura/joseagustin.garcia/ENSEMBLE/zzjuntos/prueba1_tag.197.cam.h0.0001-01-01-00000.nc
/lusitania_homes/UniversidadDeExtremadura/joseagustin.garcia/ENSEMBLE/zzjuntos/prueba1_tag.252.cam.h0.0001-01-01-00000.nc
***********************************************
PCA Test Results
***********************************************

Summary: 0 PC scores failed at least 2 runs: []

These runs PASSED according to our testing criterion.
PC 11: failed 1 runs [3]
PC 47: failed 1 runs [1]

Run 1: 1 PC scores failed [47]
Run 2: 0 PC scores failed []
Run 3: 1 PC scores failed [11]

Testing complete.

Because the test pick up at random three different simulations, the results depend on the simulations used.

What does it mean the above results ?


Thanks

Agustin
 

jedwards

CSEG and Liaisons
Staff member
Hi Agustin, I'm sorry if the instructions were not more clear. You do not need to generate the entire ensemble, only the three
members to be compared. Then you upload those files to CESM2 | Verification
and they are verified against our ensemble. Your results are good except that you are comparing to your own machine.
 
I have uploaded three simulations, I receive the following message:


CESM Version Tested: CESM 2.1.3
Metadata retrieved from: prueba1_tag.197.cam.h0.0001-01-01-00000.nc

***********************************************
PCA Test Results
***********************************************

Summary: 50 PC scores failed at least 2 runs: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50]

These runs FAILED according to our testing criterion.
PC 1: failed 3 runs [1, 2, 3]
PC 2: failed 3 runs [1, 2, 3]
PC 3: failed 3 runs [1, 2, 3]
PC 4: failed 3 runs [1, 2, 3]
PC 5: failed 3 runs [1, 2, 3]
PC 6: failed 3 runs [1, 2, 3]
PC 7: failed 3 runs [1, 2, 3]
PC 8: failed 3 runs [1, 2, 3]
PC 9: failed 3 runs [1, 2, 3]
PC 10: failed 3 runs [1, 2, 3]
PC 11: failed 3 runs [1, 2, 3]
PC 12: failed 3 runs [1, 2, 3]
PC 13: failed 3 runs [1, 2, 3]
PC 14: failed 3 runs [1, 2, 3]
PC 15: failed 3 runs [1, 2, 3]
PC 16: failed 3 runs [1, 2, 3]
PC 17: failed 3 runs [1, 2, 3]
PC 18: failed 3 runs [1, 2, 3]
PC 19: failed 3 runs [1, 2, 3]
PC 20: failed 3 runs [1, 2, 3]
PC 21: failed 3 runs [1, 2, 3]
PC 22: failed 3 runs [1, 2, 3]
PC 23: failed 3 runs [1, 2, 3]
PC 24: failed 3 runs [1, 2, 3]
PC 25: failed 3 runs [1, 2, 3]
PC 26: failed 3 runs [1, 2, 3]
PC 27: failed 3 runs [1, 2, 3]
PC 28: failed 3 runs [1, 2, 3]
PC 29: failed 3 runs [1, 2, 3]
PC 30: failed 3 runs [1, 2, 3]
PC 31: failed 3 runs [1, 2, 3]
PC 32: failed 3 runs [1, 2, 3]
PC 33: failed 3 runs [1, 2, 3]
PC 34: failed 3 runs [1, 2, 3]
PC 35: failed 3 runs [1, 2, 3]
PC 36: failed 3 runs [1, 2, 3]
PC 37: failed 3 runs [1, 2, 3]
PC 38: failed 3 runs [1, 2, 3]
PC 39: failed 3 runs [1, 2, 3]
PC 40: failed 3 runs [1, 2, 3]
PC 41: failed 3 runs [1, 2, 3]
PC 42: failed 3 runs [1, 2, 3]
PC 43: failed 3 runs [1, 2, 3]
PC 44: failed 3 runs [1, 2, 3]
PC 45: failed 3 runs [1, 2, 3]
PC 46: failed 3 runs [1, 2, 3]
PC 47: failed 3 runs [1, 2, 3]
PC 48: failed 3 runs [1, 2, 3]
PC 49: failed 3 runs [1, 2, 3]
PC 50: failed 3 runs [1, 2, 3]

Run 1: 50 PC scores failed [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50]
Run 2: 50 PC scores failed [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50]
Run 3: 50 PC scores failed [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50]

Testing complete.

As you may see a complete failure. What next ?

thanks,

Agustin
 

jedwards

CSEG and Liaisons
Staff member
Definitely failing - you never provided the system information, what compiler and version, what mpi library and version?
 
I don't know what is the problem not to appear the attached files
 

Attachments

  • config_compilers_xml.txt
    4.2 KB · Views: 1
  • config_machines_xml.txt
    7.7 KB · Views: 2

jedwards

CSEG and Liaisons
Staff member
Intel 16 is a very old compiler version and I recall going directly from intel 15 to 17 here.
Please update your compiler. The most recent intel compiler set is freely available for download.
 
I finally recompile CESM2.1.3 with the most recent version of the Intel Compiler (2021.5.0) and ... it works. Now there is not a Snowball Earth

I would have never thought that the failure of the simulation was due to the version of the compiler.

Thanks Edwards for your help.

Sincerely

Agustin
 
Top