Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

slurmstepd: error:DUE TO TIME LIMIT ***

changmao

Yufei Wang
New Member
Hi all,

I am looking forward to conducting some experiments using CESM2 FHIST. Everything was smooth sailing during the first year's run. However, when I switched the CONTINUE_TYPE to True and tried to keep going, that's when things went south. The details are as follows:

Case setup: ./create_newcase --case $HOME/usryfwang/cases/f.e22.FHIST.f09_f09.control --res f09_f09_mg17 --compset FHIST

Cesm.log:
Opened file f.e22.FHIST.f09_f09.control.cam.h0.1981-03.nc to write 3014656
NetCDF: Invalid dimension ID or name
NetCDF: Variable not found
NetCDF: Variable not found
NetCDF: Invalid dimension ID or name

srun: Job step aborted: Waiting up to 32 seconds for job step to finish.slurmstepd: error: *** STEP 12587797.0 ON a3310n05 CANCELLED AT 2024-04-15T11:29:22 DUE TO TIME LIMIT ***


There are no obvious errors in the logs of other modules.
Does anyone know the cause of the error and how to solve it? Any help will be deeply appreciated!
 

peverley

Courtney Peverley
Moderator
Hi - I can't tell much based on what you've provided, but my advice would be to make sure your restart files were generated correctly on the initial run.
 

dbailey

CSEG and Liaisons
Staff member
This looks like it aborted on the writing of the history file. Are you filling up the disk space? Was there a temporary failure of the file system?
 
Top