Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Porting CIME - tests passing except Z_FullSystemTest

jatkinson1000

Jack
New Member
What version of the code are you using?
CIME, based on the maint-5.6 tag, commit 01194c2cedc790e47d1a8d1a057fff49225925d1

Have you made any changes to files in the source tree?
Adapted the xml config files to add a new machine

Describe every step you took leading up to the problem:
From cime/scripts/tests I am running the regression tests one by one. All pass except ./scripts_regression_tests.py Z_FullSystemTest

Within Z all pass except the final SMS_D_Ln9.f19_g16_rx1.A (terminal output at the end of this message).

Describe your problem or question:
I am working to port CIME (and hopefully then CESM) to a new machine.
So far I have run most of the regression tests successfully, with the exception of SMS_D_Ln9.f19_g16_rx1.A in Z_FullSystemTest

I have attached the cesm.log and TestStatus.log files, and it seems to perhaps be an issue within an MPI_Init call, but this is the only test I see this in, so any insight would be greatly appreciated.

Related, it was quite hard to get hold of the log files, as the case directory where output is sent seems to be deleted immediately upon failure.
Is this expected, or is there a setting I can change so that the case directories of failed tests remain for investigation after the tests finish?



Code:
[user@login tests]$ ./scripts_regression_tests.py Z_FullSystemTest


Testing commit 2a1c09401fa9cc45c71cdb4f767a223e6a84a3e8

Using cime_model = cesm

Testing machine = csd3

Test root: /home/user/hpc-work/scripts_regression_test.20250331_211428


pylint version 1.5 or newer not found, pylint tests skipped

test_full_system (__main__.Z_FullSystemTest.test_full_system) ... FAIL


======================================================================

FAIL: test_full_system (__main__.Z_FullSystemTest.test_full_system)

----------------------------------------------------------------------

Traceback (most recent call last):                                                                                                                                                                                                                                 [40/1545]

  File "/home/user/cime/scripts/tests/./scripts_regression_tests.py", line 1524, in test_full_system

    self._create_test(["--walltime=0:15:00", "cime_developer"], test_id=self._baseline_name)

  File "/home/user/cime/scripts/tests/./scripts_regression_tests.py", line 970, in _create_test

    self._wait_for_tests(test_id, expect_works=(not pre_run_errors and not run_errors))

  File "/home/user/cime/scripts/tests/./scripts_regression_tests.py", line 978, in _wait_for_tests

    run_cmd_assert_result(self, "{}/wait_for_tests {} *{}/TestStatus".format(TOOLS_DIR, timeout_arg, test_id),

  File "/home/user/cime/scripts/tests/./scripts_regression_tests.py", line 76, in run_cmd_assert_result

    test_obj.assertEqual(stat, expected_stat, msg=msg)

AssertionError: 100 != 0 :

COMMAND: /home/user/cime/scripts/Tools/wait_for_tests  *fake_testing_only_20250331_211428/TestStatus

FROM_DIR: /home/user/rds/hpc-work/scripts_regression_test.20250331_211428

SHOULD HAVE WORKED, INSTEAD GOT STAT 100

OUTPUT: Test 'DAE.f19_f19.A.csd3_intel' finished with status 'PASS'

    Path: DAE.f19_f19.A.csd3_intel.fake_testing_only_20250331_211428/TestStatus

Test 'ERI.f09_g16.X.csd3_intel' finished with status 'PASS'

    Path: ERI.f09_g16.X.csd3_intel.fake_testing_only_20250331_211428/TestStatus

Test 'ERIO.f09_g16.X.csd3_intel' finished with status 'PASS'

    Path: ERIO.f09_g16.X.csd3_intel.fake_testing_only_20250331_211428/TestStatus

Test 'ERP.f45_g37_rx1.A.csd3_intel' finished with status 'PASS'

    Path: ERP.f45_g37_rx1.A.csd3_intel.fake_testing_only_20250331_211428/TestStatus

Test 'ERR.f45_g37_rx1.A.csd3_intel' finished with status 'PASS'

    Path: ERR.f45_g37_rx1.A.csd3_intel.fake_testing_only_20250331_211428/TestStatus

Test 'ERS.ne30_g16_rx1.A.csd3_intel.drv-y100k' finished with status 'PASS'

    Path: ERS.ne30_g16_rx1.A.csd3_intel.drv-y100k.fake_testing_only_20250331_211428/TestStatus

Test 'IRT_N2.f19_g16_rx1.A.csd3_intel' finished with status 'PASS'

    Path: IRT_N2.f19_g16_rx1.A.csd3_intel.fake_testing_only_20250331_211428/TestStatus

Test 'LDSTA.f45_g37_rx1.A.csd3_intel' finished with status 'PASS'

    Path: LDSTA.f45_g37_rx1.A.csd3_intel.fake_testing_only_20250331_211428/TestStatus

Test 'MCC_P1.f19_g16_rx1.A.csd3_intel' finished with status 'PASS'

    Path: MCC_P1.f19_g16_rx1.A.csd3_intel.fake_testing_only_20250331_211428/TestStatus

Test 'NCK_Ld3.f45_g37_rx1.A.csd3_intel' finished with status 'PASS'

    Path: NCK_Ld3.f45_g37_rx1.A.csd3_intel.fake_testing_only_20250331_211428/TestStatus

Test 'PEM_P4.f19_f19.A.csd3_intel' finished with status 'PASS'

    Path: PEM_P4.f19_f19.A.csd3_intel.fake_testing_only_20250331_211428/TestStatus

Test 'PET_P4.f19_f19.A.csd3_intel' finished with status 'PASS'

    Path: PET_P4.f19_f19.A.csd3_intel.fake_testing_only_20250331_211428/TestStatus

Test 'PRE.f19_f19.ADESP.csd3_intel' finished with status 'PASS'

    Path: PRE.f19_f19.ADESP.csd3_intel.fake_testing_only_20250331_211428/TestStatus

Test 'PRE.f19_f19.ADESP_TEST.csd3_intel' finished with status 'PASS'

    Path: PRE.f19_f19.ADESP_TEST.csd3_intel.fake_testing_only_20250331_211428/TestStatus

Test 'SEQ_Ln9.f19_g16_rx1.A.csd3_intel' finished with status 'PASS'

    Path: SEQ_Ln9.f19_g16_rx1.A.csd3_intel.fake_testing_only_20250331_211428/TestStatus

Test 'SMS.T42_T42.S.csd3_intel' finished with status 'PASS'

    Path: SMS.T42_T42.S.csd3_intel.fake_testing_only_20250331_211428/TestStatus

Test 'SMS_D_Ln9.f19_g16_rx1.A.csd3_intel' finished with status 'FAIL'

    Path: SMS_D_Ln9.f19_g16_rx1.A.csd3_intel.fake_testing_only_20250331_211428/TestStatus

ERRPUT:


----------------------------------------------------------------------

Ran 1 test in 929.139s


FAILED (failures=1)

Detected failures, leaving directory: /home/user/rds/hpc-work/scripts_regression_test.20250331_211428
 

Attachments

  • TestStatus.log.txt
    11.5 KB · Views: 1
  • cesm.log.txt
    2.8 KB · Views: 1

jedwards

CSEG and Liaisons
Staff member
I would recommend running that single test SMS_D_Ln9.f19_g16_rx1.A.csd3_intel outside
of the framework using
cd cime/scripts
./create_test SMS_D_Ln9.f19_g16_rx1.A.csd3_intel

However the problem seems to be in your mpi library - it's failing in mpi_init - you may want to review with your system support staff.
 
Top