Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Problems with porting tests

cemac-ccs

Chris Symonds
New Member
Hi

I am in the process of porting CESM 2.1.3 and following the instructions in the porting guide at 6. Porting and validating CIME on a new platform — CIME cime5.6 documentation. When I run the scripts_regression_tests.py script to validate the initial port I get failures with the following tests:

K_TestCimeCase.test_cime_case_test_custom_project
O_TestTestScheduler.test_c_use_existing
O_TestTestScheduler.test_d_retry
Q_TestBlessTestResults.test_bless_test_results
T_TestRunRestart.test_run_restart
Z_FullSystemTest.test_full_system

Unfortunately the remaining outputs are unenlightening for some of the tests ( K_TestCimeCase.test_cime_case_test_custom_project, O_TestTestScheduler.test_c_use_existing and T_TestRunRestart.test_run_restart), with the contents of the scripts_regression_test.$TIMESTAMP being only the testreporter script. This makes any post-mortem analysis for why those tests have failed impossible.

Is there anywhere I can find guidance on what these tests are testing so that I can attempt to find the source of the problem they are having?
 

cemac-ccs

Chris Symonds
New Member
Additional info - Attached are my version_info.txt, config_machines.xml, config_batch.xml and config_compilers.xml files (with xml files altered to txt files due to uploading restrictions) as requested in the 'Information to include' page. I should also mention that the changes made to the externals are just an addition to the wget command in the config_inputdata.xml file as detailed here and addition of PES layout as detailed here. Compiler version is gnu 7.5.0

I should also mention that for other tests the test output is also difficult to decipher, for example the O_TestTestScheduler.test_d_retry test which gives, as the output to the cs.status script the following:

20210922_180654
TESTBUILDFAIL_P1.f19_g16_rx1.A.archer2_gnu (Overall: PASS) details:
PASS TESTBUILDFAIL_P1.f19_g16_rx1.A.archer2_gnu CREATE_NEWCASE
PASS TESTBUILDFAIL_P1.f19_g16_rx1.A.archer2_gnu XML
PASS TESTBUILDFAIL_P1.f19_g16_rx1.A.archer2_gnu SETUP
PASS TESTBUILDFAIL_P1.f19_g16_rx1.A.archer2_gnu SHAREDLIB_BUILD time=0
PASS TESTBUILDFAIL_P1.f19_g16_rx1.A.archer2_gnu MODEL_BUILD time=1
PASS TESTBUILDFAIL_P1.f19_g16_rx1.A.archer2_gnu SUBMIT
PASS TESTBUILDFAIL_P1.f19_g16_rx1.A.archer2_gnu RUN time=1
PASS TESTBUILDFAIL_P1.f19_g16_rx1.A.archer2_gnu MEMLEAK insuffiencient data for memleak test
PASS TESTBUILDFAIL_P1.f19_g16_rx1.A.archer2_gnu SHORT_TERM_ARCHIVER
TESTRUNFAIL_P1.f19_g16_rx1.A.archer2_gnu (Overall: FAIL) details:
PASS TESTRUNFAIL_P1.f19_g16_rx1.A.archer2_gnu CREATE_NEWCASE
PASS TESTRUNFAIL_P1.f19_g16_rx1.A.archer2_gnu XML
PASS TESTRUNFAIL_P1.f19_g16_rx1.A.archer2_gnu SETUP
PASS TESTRUNFAIL_P1.f19_g16_rx1.A.archer2_gnu SHAREDLIB_BUILD time=0
PASS TESTRUNFAIL_P1.f19_g16_rx1.A.archer2_gnu MODEL_BUILD time=1
PASS TESTRUNFAIL_P1.f19_g16_rx1.A.archer2_gnu SUBMIT
FAIL TESTRUNFAIL_P1.f19_g16_rx1.A.archer2_gnu RUN time=1
TESTRUNPASS_P1.f19_g16_rx1.A.archer2_gnu (Overall: PASS) details:
PASS TESTRUNPASS_P1.f19_g16_rx1.A.archer2_gnu CREATE_NEWCASE
PASS TESTRUNPASS_P1.f19_g16_rx1.A.archer2_gnu XML
PASS TESTRUNPASS_P1.f19_g16_rx1.A.archer2_gnu SETUP
PASS TESTRUNPASS_P1.f19_g16_rx1.A.archer2_gnu SHAREDLIB_BUILD time=0
PASS TESTRUNPASS_P1.f19_g16_rx1.A.archer2_gnu MODEL_BUILD time=1
PASS TESTRUNPASS_P1.f19_g16_rx1.A.archer2_gnu SUBMIT
PASS TESTRUNPASS_P1.f19_g16_rx1.A.archer2_gnu RUN time=2
PASS TESTRUNPASS_P1.f19_g16_rx1.A.archer2_gnu MEMLEAK insuffiencient data for memleak test
PASS TESTRUNPASS_P1.f19_g16_rx1.A.archer2_gnu SHORT_TERM_ARCHIVER

In the TESTRUNFAIL dir the main output file contains the following:

Running test for TESTRUNFAIL
WARNING: Found difference in test CHECK_TIMING: case: False original value True
doing an 11 ndays startup test, with restarts every 11 ndays
File /work/n02/n02/csymonds/cesm/CESM2.1.3/runs/scripts_regression_test.20210922_180654/TESTRUNFAIL_P1.f19_g16_rx1.A.archer2_gnu.20210922_180654/LockedFiles/env_build.xml has been modified
Creating component namelists
Calling /lus/cls01095/work/n02/n02/csymonds/cesm/CESM2.1.3/my_cesm_sandbox/cime/src/drivers/mct/cime_config/buildnml
Finished creating component namelists
-------------------------------------------------------------------------
- Prestage required restarts into /work/n02/n02/csymonds/cesm/CESM2.1.3/runs/scripts_regression_test.20210922_180654/TESTRUNFAIL_P1.f19_g16_rx1.A.archer2_gnu.20210922_180654/run
- Case input data directory (DIN_LOC_ROOT) is /work/n02/n02/csymonds/cesm/CESM2.1.3/cesm_inputdata
- Checking for required input datasets in DIN_LOC_ROOT
-------------------------------------------------------------------------
2021-09-22 18:07:18 MODEL EXECUTION BEGINS HERE
run command is srun --distribution=block:block --hint=nomultithread /work/n02/n02/csymonds/cesm/CESM2.1.3/runs/scripts_regression_test.20210922_180654/TESTRUNFAIL_P1.f19_g16_rx1.A.archer2_gnu.20210922_180654/bld/cesm.exe >> cesm.log.$LID 2>&1
ERROR: RUN FAIL: Command 'srun --distribution=block:block --hint=nomultithread /work/n02/n02/csymonds/cesm/CESM2.1.3/runs/scripts_regression_test.20210922_180654/TESTRUNFAIL_P1.f19_g16_rx1.A.archer2_gnu.20210922_180654/bld/cesm.exe >> cesm.log.$LID 2>&1 ' failed
See log file for details: /work/n02/n02/csymonds/cesm/CESM2.1.3/runs/scripts_regression_test.20210922_180654/TESTRUNFAIL_P1.f19_g16_rx1.A.archer2_gnu.20210922_180654/run/cesm.log.523850.210922-180717

and the log indicated merely says:

Insta fail
srun: error: nid001341: task 0: Exited with exit code 255
srun: Terminating job step 523850.0

which is not very informative. Am I looking in the wrong place for information?
 

Attachments

  • version_info.txt
    5.4 KB · Views: 3
  • config_machines.xml.txt
    5.5 KB · Views: 8
  • config_compilers.xml.txt
    4.5 KB · Views: 1
  • config_batch.xml.txt
    3.1 KB · Views: 0

fischer

CSEG and Liaisons
Staff member
Was that the entire contents of cesm.log.523850.210922-180717? In the test directory ( /work/n02/n02/csymonds/cesm/CESM2.1.3/runs/scripts_regression_test.20210922_180654/TESTRUNFAIL_P1.f19_g16_rx1.A.archer2_gnu.20210922_180654) you can also look at TestStatus.log and there might be a test.TESTRUNFAIL_P1....
 

cemac-ccs

Chris Symonds
New Member
Thanks for your fast response

Yes, that was the entire contents of the log file. The TestStatus.log file contains:

2021-09-22 18:06:55: CREATE_NEWCASE PASSED for test 'TESTRUNFAIL_P1.f19_g16_rx1.A.archer2_gnu'.
Command: /lus/cls01095/work/n02/n02/csymonds/cesm/CESM2.1.3/my_cesm_sandbox/cime/scripts/create_newcase --case /work/n02/n02/csymonds/cesm/CESM2.1.3/runs/scripts_regression_test.20210922_180654/TESTRUNFAIL_P1.f19_g16_rx1.A.archer2_gnu.20210922_180654 --res f19_g16_rx1 --compset A --test --machine archer2 --compiler gnu --project ecseac06-guest --output-root /work/n02/n02/csymonds/cesm/CESM2.1.3/runs/scripts_regression_test.20210922_180654 --pecount 1 --mpilib mpich --walltime 0:05:00
Output: b"Compset longname is 2000_DATM%NYF_SLND_DICE%SSMI_DOCN%DOM_DROF%NYF_SGLC_SWAV\nCompset specification file is /lus/cls01095/work/n02/n02/csymonds/cesm/CESM2.1.3/my_cesm_sandbox/cime/src/drivers/mct/cime_config/config_compsets.xml\nCompset forcing is 1972-2004\nATM component is Data driven ATM COREv2 normal year forcing\nLND component is Stub land component\nICE component is dice mode is ssmi\nOCN component is DOCN prescribed ocean mode\nROF component is Data runoff modelCOREv2 normal year forcing:\nGLC component is Stub glacier (land ice) component\nWAV component is Stub wave component\nESP component is \nPes specification file is /lus/cls01095/work/n02/n02/csymonds/cesm/CESM2.1.3/my_cesm_sandbox/cime/src/drivers/mct/cime_config/config_pes.xml\nMachine is archer2\nPes setting: grid is a%1.9x2.5_l%null_oi%gx1v6_r%rx1_g%null_w%null_m%gx1v6 \nPes setting: compset is 2000_DATM%NYF_SLND_DICE%SSMI_DOCN%DOM_DROF%NYF_SGLC_SWAV \nPes setting: tasks is {'NTASKS_ATM': -1, 'NTASKS_ROF': -1, 'NTASKS_OCN': -1, 'NTASKS_ICE': -1, 'NTASKS_CPL': -1, 'NTASKS_LND': -1, 'NTASKS_GLC': -1, 'NTASKS_WAV': -1, 'NTASKS_ESP': -1} \nPes setting: threads is {'NTHRDS_ATM': 1, 'NTHRDS_LND': 1, 'NTHRDS_ROF': 1, 'NTHRDS_ICE': 1, 'NTHRDS_OCN': 1, 'NTHRDS_GLC': 1, 'NTHRDS_WAV': 1, 'NTHRDS_ESP': 1, 'NTHRDS_CPL': 1} \nPes setting: rootpe is {'ROOTPE_ATM': 0, 'ROOTPE_ROF': 0, 'ROOTPE_ICE': 0, 'ROOTPE_OCN': 0, 'ROOTPE_CPL': 0, 'ROOTPE_LND': 0, 'ROOTPE_GLC': 0, 'ROOTPE_WAV': 0, 'ROOTPE_ESP': 0} \nPes setting: pstrid is {} \nPes other settings: {}\nPes comments: none\n Compset is: 2000_DATM%NYF_SLND_DICE%SSMI_DOCN%DOM_DROF%NYF_SGLC_SWAV \n Grid is: a%1.9x2.5_l%null_oi%gx1v6_r%rx1_g%null_w%null_m%gx1v6 \n Components in compset are: ['datm', 'slnd', 'dice', 'docn', 'drof', 'sglc', 'swav', 'sesp', 'drv', 'dart'] \nNo charge_account info available, using value from PROJECT\nUsing project from config_machines.xml: ecseac06-guest\ncesm model version found: cesm2.1.3-rc.01\nBatch_system_type is slurm\njob is case.test USER_REQUESTED_WALLTIME None USER_REQUESTED_QUEUE None WALLTIME_FORMAT %H:%M:%S\njob is case.st_archive USER_REQUESTED_WALLTIME None USER_REQUESTED_QUEUE None WALLTIME_FORMAT %H:%M:%S\n Creating Case directory /work/n02/n02/csymonds/cesm/CESM2.1.3/runs/scripts_regression_test.20210922_180654/TESTRUNFAIL_P1.f19_g16_rx1.A.archer2_gnu.20210922_180654\n"

---------------------------------------------------
2021-09-22 18:06:57: SETUP PASSED for test 'TESTRUNFAIL_P1.f19_g16_rx1.A.archer2_gnu'.
Command: ./case.setup
Output: b"job is case.test USER_REQUESTED_WALLTIME 0:05:00 USER_REQUESTED_QUEUE None WALLTIME_FORMAT %H:%M:%S\nCreating batch scripts\nWriting case.test script from input template /lus/cls01095/work/n02/n02/csymonds/cesm/CESM2.1.3/my_cesm_sandbox/cime/config/cesm/machines/template.case.test\nCreating file .case.test\nWriting case.st_archive script from input template /lus/cls01095/work/n02/n02/csymonds/cesm/CESM2.1.3/my_cesm_sandbox/cime/config/cesm/machines/template.st_archive\nCreating file case.st_archive\nCreating user_nl_xxx files for components and cpl\nIf an old case build already exists, might want to run 'case.build --clean' before building\nYou can now run './preview_run' to get more info on how your case will be run\n/work/n02/n02/csymonds/cesm/CESM2.1.3/runs/scripts_regression_test.20210922_180654/TESTRUNFAIL_P1.f19_g16_rx1.A.archer2_gnu.20210922_180654/env_mach_specific.xml already exists, delete to replace"

---------------------------------------------------
2021-09-22 18:06:58: SHAREDLIB_BUILD PASSED for test 'TESTRUNFAIL_P1.f19_g16_rx1.A.archer2_gnu'.
Command: ./case.build --sharedlib-only
Output: b'WARNING: Found difference in test STOP_N: case: 11 original value 5\nBuilding test for TESTRUNFAIL in directory /lus/cls01095/work/n02/n02/csymonds/cesm/CESM2.1.3/runs/scripts_regression_test.20210922_180654/TESTRUNFAIL_P1.f19_g16_rx1.A.archer2_gnu.20210922_180654'

---------------------------------------------------
2021-09-22 18:07:00: MODEL_BUILD PASSED for test 'TESTRUNFAIL_P1.f19_g16_rx1.A.archer2_gnu'.
Command: ./case.build --model-only
Output: b'WARNING: Found difference in test STOP_N: case: 11 original value 5\nBuilding test for TESTRUNFAIL in directory /lus/cls01095/work/n02/n02/csymonds/cesm/CESM2.1.3/runs/scripts_regression_test.20210922_180654/TESTRUNFAIL_P1.f19_g16_rx1.A.archer2_gnu.20210922_180654'
---------------------------------------------------
2021-09-22 18:07:14: SUBMIT PASSED for test 'TESTRUNFAIL_P1.f19_g16_rx1.A.archer2_gnu'.
Command: ./case.submit --skip-preview-namelist
Output: b"Creating component namelists\n Calling /lus/cls01095/work/n02/n02/csymonds/cesm/CESM2.1.3/my_cesm_sandbox/cime/src/components/data_comps/datm/cime_config/buildnml\n Calling /lus/cls01095/work/n02/n02/csymonds/cesm/CESM2.1.3/my_cesm_sandbox/cime/src/components/stub_comps/slnd/cime_config/buildnml\n Calling /lus/cls01095/work/n02/n02/csymonds/cesm/CESM2.1.3/my_cesm_sandbox/cime/src/components/data_comps/dice/cime_config/buildnml\n Calling /lus/cls01095/work/n02/n02/csymonds/cesm/CESM2.1.3/my_cesm_sandbox/cime/src/components/data_comps/docn/cime_config/buildnml\n Calling /lus/cls01095/work/n02/n02/csymonds/cesm/CESM2.1.3/my_cesm_sandbox/cime/src/components/data_comps/drof/cime_config/buildnml\n Calling /lus/cls01095/work/n02/n02/csymonds/cesm/CESM2.1.3/my_cesm_sandbox/cime/src/components/stub_comps/sglc/cime_config/buildnml\n Calling /lus/cls01095/work/n02/n02/csymonds/cesm/CESM2.1.3/my_cesm_sandbox/cime/src/components/stub_comps/swav/cime_config/buildnml\n Calling /lus/cls01095/work/n02/n02/csymonds/cesm/CESM2.1.3/my_cesm_sandbox/cime/src/components/stub_comps/sesp/cime_config/buildnml\n Calling /lus/cls01095/work/n02/n02/csymonds/cesm/CESM2.1.3/my_cesm_sandbox/cime/src/drivers/mct/cime_config/buildnml\nFinished creating component namelists\nChecking that inputdata is available as part of case submission\nLoading input file list: 'Buildconf/cpl.input_data_list'\nLoading input file list: 'Buildconf/dice.input_data_list'\nLoading input file list: 'Buildconf/docn.input_data_list'\nLoading input file list: 'Buildconf/datm.input_data_list'\nLoading input file list: 'Buildconf/drof.input_data_list'\nCheck case OK\nSubmitting job script sbatch --time 0:05:00 -q standard --account ecseac06-guest .case.test --skip-preview-namelist\nSubmitted job id is 523850\nSubmitted job case.test with id 523850\nFile /work/n02/n02/csymonds/cesm/CESM2.1.3/runs/scripts_regression_test.20210922_180654/TESTRUNFAIL_P1.f19_g16_rx1.A.archer2_gnu.20210922_180654/LockedFiles/env_build.xml has been modified\nsubmit_jobs case.test\nSubmit job case.test"

---------------------------------------------------
2021-09-22 18:07:18: ERROR: RUN FAIL: Command 'srun --distribution=block:block --hint=nomultithread /work/n02/n02/csymonds/cesm/CESM2.1.3/runs/scripts_regression_test.20210922_180654/TESTRUNFAIL_P1.f19_g16_rx1.A.archer2_gnu.20210922_180654/bld/cesm.exe >> cesm.log.$LID 2>&1 ' failed
See log file for details: /work/n02/n02/csymonds/cesm/CESM2.1.3/runs/scripts_regression_test.20210922_180654/TESTRUNFAIL_P1.f19_g16_rx1.A.archer2_gnu.20210922_180654/run/cesm.log.523850.210922-180717
---------------------------------------------------


The test.TESTRUNFAIL_P1.... file you mention is the one I pasted into my previous post as the main output file in the TESTRUNFAIL dir.
 

cemac-ccs

Chris Symonds
New Member
Also, if it helps, the stdout from the test was:

Code:
Testing commit fd9fbf7687505a1c4c6c8b9a4dc52494faa07b6f
Using cime_model = cesm
Testing machine = archer2
Testing compiler = gnu
Testing mpilib = mpich
Test root: /work/n02/n02/csymonds/cesm/CESM2.1.3/runs/scripts_regression_test.20210922_180654


test_d_retry (__main__.O_TestTestScheduler) ... FAIL


======================================================================
FAIL: test_d_retry (__main__.O_TestTestScheduler)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "scripts_regression_tests.py", line 1208, in test_d_retry
    self._create_test(args)
  File "scripts_regression_tests.py", line 951, in _create_test
    expected_stat=expected_stat)
  File "scripts_regression_tests.py", line 67, in run_cmd_assert_result
    test_obj.assertEqual(stat, expected_stat, msg=msg)
AssertionError: 100 != 0 :
COMMAND:  /lus/cls01095/work/n02/n02/csymonds/cesm/CESM2.1.3/my_cesm_sandbox/cime/scripts/create_test TESTBUILDFAIL_P1.f19_g16_rx1.A TESTRUNFAIL_P1.f19_g16_rx1.A TESTRUNPASS_P1.f19_g16_rx1.A --retry=1 -t 20210922_180654 --baseline-root /work/n02/n02/csymonds/cesm/CESM2.1.3/runs/scripts_regression_test.20210922_180654/baselines --compiler=gnu --mpilib=mpich --test-root=/work/n02/n02/csymonds/cesm/CESM2.1.3/runs/scripts_regression_test.20210922_180654 --output-root=/work/n02/n02/csymonds/cesm/CESM2.1.3/runs/scripts_regression_test.20210922_180654
FROM_DIR: /lus/cls01095/work/n02/n02/csymonds/cesm/CESM2.1.3/my_cesm_sandbox/cime/scripts/tests
SHOULD HAVE WORKED, INSTEAD GOT STAT 100
OUTPUT: Testnames: ['TESTBUILDFAIL_P1.f19_g16_rx1.A.archer2_gnu', 'TESTRUNFAIL_P1.f19_g16_rx1.A.archer2_gnu', 'TESTRUNPASS_P1.f19_g16_rx1.A.archer2_gnu']
No project info available
Creating test directory /work/n02/n02/csymonds/cesm/CESM2.1.3/runs/scripts_regression_test.20210922_180654/TESTBUILDFAIL_P1.f19_g16_rx1.A.archer2_gnu.20210922_180654
Creating test directory /work/n02/n02/csymonds/cesm/CESM2.1.3/runs/scripts_regression_test.20210922_180654/TESTRUNFAIL_P1.f19_g16_rx1.A.archer2_gnu.20210922_180654
Creating test directory /work/n02/n02/csymonds/cesm/CESM2.1.3/runs/scripts_regression_test.20210922_180654/TESTRUNPASS_P1.f19_g16_rx1.A.archer2_gnu.20210922_180654
RUNNING TESTS:
  TESTBUILDFAIL_P1.f19_g16_rx1.A.archer2_gnu
  TESTRUNFAIL_P1.f19_g16_rx1.A.archer2_gnu
  TESTRUNPASS_P1.f19_g16_rx1.A.archer2_gnu
Starting CREATE_NEWCASE for test TESTBUILDFAIL_P1.f19_g16_rx1.A.archer2_gnu with 1 procs
Starting CREATE_NEWCASE for test TESTRUNFAIL_P1.f19_g16_rx1.A.archer2_gnu with 1 procs
Starting CREATE_NEWCASE for test TESTRUNPASS_P1.f19_g16_rx1.A.archer2_gnu with 1 procs
Finished CREATE_NEWCASE for test TESTRUNPASS_P1.f19_g16_rx1.A.archer2_gnu in 0.741638 seconds (PASS)
Finished CREATE_NEWCASE for test TESTRUNFAIL_P1.f19_g16_rx1.A.archer2_gnu in 0.746009 seconds (PASS)
Finished CREATE_NEWCASE for test TESTBUILDFAIL_P1.f19_g16_rx1.A.archer2_gnu in 0.746928 seconds (PASS)
Starting XML for test TESTBUILDFAIL_P1.f19_g16_rx1.A.archer2_gnu with 1 procs
Starting XML for test TESTRUNFAIL_P1.f19_g16_rx1.A.archer2_gnu with 1 procs
Starting XML for test TESTRUNPASS_P1.f19_g16_rx1.A.archer2_gnu with 1 procs
Finished XML for test TESTBUILDFAIL_P1.f19_g16_rx1.A.archer2_gnu in 0.226157 seconds (PASS)
Finished XML for test TESTRUNPASS_P1.f19_g16_rx1.A.archer2_gnu in 0.226878 seconds (PASS)
Finished XML for test TESTRUNFAIL_P1.f19_g16_rx1.A.archer2_gnu in 0.234606 seconds (PASS)
Starting SETUP for test TESTBUILDFAIL_P1.f19_g16_rx1.A.archer2_gnu with 1 procs
Starting SETUP for test TESTRUNFAIL_P1.f19_g16_rx1.A.archer2_gnu with 1 procs
Starting SETUP for test TESTRUNPASS_P1.f19_g16_rx1.A.archer2_gnu with 1 procs
Finished SETUP for test TESTRUNFAIL_P1.f19_g16_rx1.A.archer2_gnu in 1.452423 seconds (PASS)
Finished SETUP for test TESTRUNPASS_P1.f19_g16_rx1.A.archer2_gnu in 1.451438 seconds (PASS)
Finished SETUP for test TESTBUILDFAIL_P1.f19_g16_rx1.A.archer2_gnu in 1.459458 seconds (PASS)
Starting SHAREDLIB_BUILD for test TESTBUILDFAIL_P1.f19_g16_rx1.A.archer2_gnu with 1 procs
Finished SHAREDLIB_BUILD for test TESTBUILDFAIL_P1.f19_g16_rx1.A.archer2_gnu in 0.380120 seconds (PASS)
Starting MODEL_BUILD for test TESTBUILDFAIL_P1.f19_g16_rx1.A.archer2_gnu with 4 procs
Starting SHAREDLIB_BUILD for test TESTRUNFAIL_P1.f19_g16_rx1.A.archer2_gnu with 1 procs
Finished SHAREDLIB_BUILD for test TESTRUNFAIL_P1.f19_g16_rx1.A.archer2_gnu in 0.369329 seconds (PASS)
Finished MODEL_BUILD for test TESTBUILDFAIL_P1.f19_g16_rx1.A.archer2_gnu in 0.370318 seconds (FAIL). [COMPLETED 1 of 3]
    Case dir: /work/n02/n02/csymonds/cesm/CESM2.1.3/runs/scripts_regression_test.20210922_180654/TESTBUILDFAIL_P1.f19_g16_rx1.A.archer2_gnu.20210922_180654
    Errors were:
        b'Building test for TESTBUILDFAIL in directory /lus/cls01095/work/n02/n02/csymonds/cesm/CESM2.1.3/runs/scripts_regression_test.20210922_180654/TESTBUILDFAIL_P1.f19_g16_rx1.A.archer2_gnu.20210922_180654\nERROR: BUILD FAIL: Intentional fail for testing infrastructure'


Starting MODEL_BUILD for test TESTRUNFAIL_P1.f19_g16_rx1.A.archer2_gnu with 4 procs
Starting SHAREDLIB_BUILD for test TESTRUNPASS_P1.f19_g16_rx1.A.archer2_gnu with 1 procs
Finished SHAREDLIB_BUILD for test TESTRUNPASS_P1.f19_g16_rx1.A.archer2_gnu in 0.379414 seconds (PASS)
Starting MODEL_BUILD for test TESTRUNPASS_P1.f19_g16_rx1.A.archer2_gnu with 4 procs
Finished MODEL_BUILD for test TESTRUNFAIL_P1.f19_g16_rx1.A.archer2_gnu in 1.737711 seconds (PASS)
Starting RUN for test TESTRUNFAIL_P1.f19_g16_rx1.A.archer2_gnu with 1 proc on interactive node and 1 procs on compute nodes
Finished MODEL_BUILD for test TESTRUNPASS_P1.f19_g16_rx1.A.archer2_gnu in 1.705718 seconds (PASS)
Starting RUN for test TESTRUNPASS_P1.f19_g16_rx1.A.archer2_gnu with 1 proc on interactive node and 1 procs on compute nodes
Finished RUN for test TESTRUNFAIL_P1.f19_g16_rx1.A.archer2_gnu in 2.041492 seconds (PEND). [COMPLETED 2 of 3]
Finished RUN for test TESTRUNPASS_P1.f19_g16_rx1.A.archer2_gnu in 2.004533 seconds (PEND). [COMPLETED 3 of 3]
Waiting for tests to finish
Test 'TESTBUILDFAIL_P1.f19_g16_rx1.A.archer2_gnu' finished with status 'FAIL'
    Path: /work/n02/n02/csymonds/cesm/CESM2.1.3/runs/scripts_regression_test.20210922_180654/TESTBUILDFAIL_P1.f19_g16_rx1.A.archer2_gnu.20210922_180654/TestStatus
Test 'TESTRUNFAIL_P1.f19_g16_rx1.A.archer2_gnu' finished with status 'FAIL'
    Path: /work/n02/n02/csymonds/cesm/CESM2.1.3/runs/scripts_regression_test.20210922_180654/TESTRUNFAIL_P1.f19_g16_rx1.A.archer2_gnu.20210922_180654/TestStatus
Test 'TESTRUNPASS_P1.f19_g16_rx1.A.archer2_gnu' finished with status 'PASS'
    Path: /work/n02/n02/csymonds/cesm/CESM2.1.3/runs/scripts_regression_test.20210922_180654/TESTRUNPASS_P1.f19_g16_rx1.A.archer2_gnu.20210922_180654/TestStatus
test-scheduler took 16.96617889404297 seconds
No project info available
Using existing test directory /work/n02/n02/csymonds/cesm/CESM2.1.3/runs/scripts_regression_test.20210922_180654/TESTBUILDFAIL_P1.f19_g16_rx1.A.archer2_gnu.20210922_180654
Test TESTRUNPASS_P1.f19_g16_rx1.A.archer2_gnu passed and will not be re-run
Using existing test directory /work/n02/n02/csymonds/cesm/CESM2.1.3/runs/scripts_regression_test.20210922_180654/TESTRUNPASS_P1.f19_g16_rx1.A.archer2_gnu.20210922_180654
RUNNING TESTS:
  TESTBUILDFAIL_P1.f19_g16_rx1.A.archer2_gnu
  TESTRUNFAIL_P1.f19_g16_rx1.A.archer2_gnu
  TESTRUNPASS_P1.f19_g16_rx1.A.archer2_gnu
Starting MODEL_BUILD for test TESTBUILDFAIL_P1.f19_g16_rx1.A.archer2_gnu with 4 procs
Starting RUN for test TESTRUNFAIL_P1.f19_g16_rx1.A.archer2_gnu with 1 procs
Finished MODEL_BUILD for test TESTBUILDFAIL_P1.f19_g16_rx1.A.archer2_gnu in 1.899008 seconds (PASS)
Starting RUN for test TESTBUILDFAIL_P1.f19_g16_rx1.A.archer2_gnu with 1 proc on interactive node and 1 procs on compute nodes
Finished RUN for test TESTRUNFAIL_P1.f19_g16_rx1.A.archer2_gnu in 2.523037 seconds (PEND). [COMPLETED 1 of 3]
Finished RUN for test TESTBUILDFAIL_P1.f19_g16_rx1.A.archer2_gnu in 2.010249 seconds (PEND). [COMPLETED 2 of 3]
Waiting for tests to finish
Test 'TESTBUILDFAIL_P1.f19_g16_rx1.A.archer2_gnu' finished with status 'PASS'
    Path: /work/n02/n02/csymonds/cesm/CESM2.1.3/runs/scripts_regression_test.20210922_180654/TESTBUILDFAIL_P1.f19_g16_rx1.A.archer2_gnu.20210922_180654/TestStatus
Test 'TESTRUNFAIL_P1.f19_g16_rx1.A.archer2_gnu' finished with status 'FAIL'
    Path: /work/n02/n02/csymonds/cesm/CESM2.1.3/runs/scripts_regression_test.20210922_180654/TESTRUNFAIL_P1.f19_g16_rx1.A.archer2_gnu.20210922_180654/TestStatus
Test 'TESTRUNPASS_P1.f19_g16_rx1.A.archer2_gnu' finished with status 'PASS'
    Path: /work/n02/n02/csymonds/cesm/CESM2.1.3/runs/scripts_regression_test.20210922_180654/TESTRUNPASS_P1.f19_g16_rx1.A.archer2_gnu.20210922_180654/TestStatus
test-scheduler took 10.191802501678467 seconds
ERRPUT:




----------------------------------------------------------------------
Ran 1 test in 32.490s


FAILED (failures=1)
pylint version 1.5 or newer not found, pylint tests skipped
Detected failures, leaving directory: /work/n02/n02/csymonds/cesm/CESM2.1.3/runs/scripts_regression_test.20210922_180654
 

fischer

CSEG and Liaisons
Staff member
I've been looking into this some more. The TESTRUNFAIL_P1 does correctly write "Insta fail" to the cesm.log. But the extra srun error message
makes me believe that the queued job is being killed because the run fail. I could be wrong. You could try doing the following just to run that one test.

./scripts_regression_tests.py O_TestTestScheduler.test_d_retry

If you run it on an interactive node, you might provide more information.

Is this error repeatable? It could also just be a system glitch.

Chris
 

cemac-ccs

Chris Symonds
New Member
Hi Chris
Apologies for the delay in replying and thank you again for helping.

I have run the test on its own a few times, all with the same result. Running interactively also doesn't help very much as the same results appear. Of course, if I use the --no-batch flag then a different error appears, where the test says that it was unable to run the 'srun' command (as you would expect).

It seems that your belief that slurm is interpreting the instafail as an error is correct as when I run the cesm.exe file on an interactive node the output is 'Insta fail' as exprected and `echo $?` gives 255 as slurm reports on closing. So does this mean then 'Task Failed Successfully'?

It seems that of the remaining 5 tests that fail,
  • O_TestTestScheduler.test_c_use_existing fails for the same reason,
  • K_TestCimeCase.test_cime_case_test_custom_project fails because it is expecting the machine to be melvin and passses when the machine is set to equal `self._machine` in the python function for that test,
  • T_TestRunRestart.test_run_restart passes when cime is changed from tag 5.6.32 to the maint-5.6 branch, but only if run on its own
  • Z_FullSystemTest.test_full_system also passes if run on its own, and
  • Q_TestBlessTestResults.test_bless_test_results still fails on the `TESTRUNDIFF_P1.f19_g16_rx1.A.archer2_gnu BASELINE fake_testing_only_20210930_180448` task.
From this it seems that the testbless test is the only one that needs worrying about. Does that seem right to you?

Thanks
Chris
 
Top