Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Error in scripts_regression_tests K_TestCimeCase

Hello, we recently switched from pbs to slurm on our HPC system and I now run into some issues with the scripts_regression_tests.

I get the following error when running the K_TestCimeCase tests:

======================================================================
ERROR: test_cime_case_st_archive_resubmit (__main__.K_TestCimeCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "./scripts_regression_tests.py", line 1672, in test_cime_case_st_archive_resubmit
case.case_st_archive(resubmit=True)
File "/scratch/brown/ekarlsso/clm5_porting2020/clm5_porting/clm5.0/cime/scripts/tests/../lib/CIME/case/case_st_archive.py", line 760, in case_st_archive
self.submit(resubmit=True)
File "/scratch/brown/ekarlsso/clm5_porting2020/clm5_porting/clm5.0/cime/scripts/tests/../lib/CIME/case/case_submit.py", line 157, in submit
custom_success_msg_functor=verbatim_success_msg)
File "/scratch/brown/ekarlsso/clm5_porting2020/clm5_porting/clm5.0/cime/scripts/tests/../lib/CIME/utils.py", line 1683, in run_and_log_case_status
rv = func()
File "/scratch/brown/ekarlsso/clm5_porting2020/clm5_porting/clm5.0/cime/scripts/tests/../lib/CIME/case/case_submit.py", line 155, in <lambda>
batch_args=batch_args)
File "/scratch/brown/ekarlsso/clm5_porting2020/clm5_porting/clm5.0/cime/scripts/tests/../lib/CIME/case/case_submit.py", line 100, in _submit
mail_type=mail_type, batch_args=batch_args)
File "/scratch/brown/ekarlsso/clm5_porting2020/clm5_porting/clm5.0/cime/scripts/tests/../lib/CIME/case/case.py", line 1203, in submit_jobs
batch_args=batch_args, dry_run=dry_run)
File "/scratch/brown/ekarlsso/clm5_porting2020/clm5_porting/clm5.0/cime/scripts/tests/../lib/CIME/XML/env_batch.py", line 515, in submit_jobs
dry_run=dry_run)
File "/scratch/brown/ekarlsso/clm5_porting2020/clm5_porting/clm5.0/cime/scripts/tests/../lib/CIME/XML/env_batch.py", line 699, in _submit_single_job
output = run_cmd_no_fail(submitcmd, combine_output=True)
File "/scratch/brown/ekarlsso/clm5_porting2020/clm5_porting/clm5.0/cime/scripts/tests/../lib/CIME/utils.py", line 516, in run_cmd_no_fail
expect(False, "Command: '{}' failed with error '{}' from dir '{}'".format(cmd, errput.encode('utf-8'), os.getcwd() if from_dir is None else from_dir))
File "/scratch/brown/ekarlsso/clm5_porting2020/clm5_porting/clm5.0/cime/scripts/tests/../lib/CIME/utils.py", line 130, in expect
raise exc_type(msg)
SystemExit: ERROR: Command: 'cd $CASEROOT ; sbatch .case.run --resubmit' failed with error 'sbatch: error: Unable to open file .case.run' from dir '/scratch/brown/ekarlsso/scripts_regression_test.20201027_153245/st_archive_resubmit_test'
----------------------------------------------------------------------
Ran 20 tests in 119.295s
FAILED (errors=1, skipped=6)
('Detected failures, leaving directory:', '/scratch/brown/ekarlsso/scripts_regression_test.20201027_153245')



In the test output directory, there is no cesm.log and there are no .nc files (i.e. the st_archive_resubmit_test/run directory is empty. The .case.run is not executable, but is in the directory. Any ideas on where it went wrong? Thank you!

Version info: release-clm5.0.34-3-gcdc544df

Processing externals description file : Externals.cfg
Processing externals description file : Externals_CISM.cfg
Processing externals description file : Externals_CLM.cfg
Checking status of externals: cism, source_cism, clm, fates, ptclm, mosart, cime, rtm,
M ./cime
modified sandbox, on cime5.6.33
./components/cism
clean sandbox, on cism-release-cesm2.1.2_02
./components/cism/source_cism
clean sandbox, on release-cism2.1.03
./components/mosart
clean sandbox, on release-cesm2.0.04
./components/rtm
clean sandbox, on release-cesm2.0.04
./src/fates
clean sandbox, on sci.1.30.0_api.8.0.0
./tools/PTCLM
clean sandbox, on PTCLM2_20200121
 

Attachments

  • config_machines.txt
    2.9 KB · Views: 2
  • config_compilers.txt
    1.2 KB · Views: 1
  • config_batch.txt
    832 bytes · Views: 1

jedwards

CSEG and Liaisons
Staff member
Are you able to submit a job from a login node successfully? If you set RESUBMIT=1
does the job successfully resubmit after the first run? It looks like one of two problems
1. sbatch is not available on compute nodes and so you cannot resubmit from there
or 2. the disk that your case directory is on is not available from the compute nodes.
 
Yes, resubmit is working so I don't think that's the problem. Below is some more info in case it helps. Again, the st_archive_resubmit_test/run directory is empty and the .case.run and case.st_archive are there but not executable.


test_cime_case_st_archive_resubmit (__main__.K_TestCimeCase) ... Successfully cleaned batch script .case.run
job is case.run USER_REQUESTED_WALLTIME 0:05:00 USER_REQUESTED_QUEUE None WALLTIME_FORMAT %H:%M:%S
Creating batch scripts
Writing case.run script from input template /scratch/brown/ekarlsso/clm5_porting2020/clm5_porting/clm5.0/cime/config/cesm/machines/template.case.run
Creating file .case.run
Writing case.st_archive script from input template /scratch/brown/ekarlsso/clm5_porting2020/clm5_porting/clm5.0/cime/config/cesm/machines/template.st_archive
Creating file case.st_archive
If an old case build already exists, might want to run 'case.build --clean' before building
You can now run './preview_run' to get more info on how your case will be run
st_archive starting
Cannot find a st_archive_resubmit_test.cpl*.r.*.nc file in directory /scratch/brown/ekarlsso/scripts_regression_test.20201027_153245/st_archive_resubmit_test/st_archive_resubmit_test/run
Archiving history files for drv (cpl)
Archiving history files for dart (esp)
st_archive completed
st_archive starting
Cannot find a st_archive_resubmit_test.cpl*.r.*.nc file in directory /scratch/brown/ekarlsso/scripts_regression_test.20201027_153245/st_archive_resubmit_test/st_archive_resubmit_test/run
Archiving history files for drv (cpl)
Archiving history files for dart (esp)
st_archive completed
resubmitting from st_archive, resubmit=2
Submitting job 'case.run', resubmit=2
submit_jobs case.run
Submit job case.run
Submitting job script cd $CASEROOT ; sbatch .case.run --resubmit
ERROR
Detected failed test or user request no teardown
Leaving files:
 
Top