Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Error related to restart files

James King

James King
Member
Hi all,

Happy New Year! I have run into an error when trying to re-run a case I had previously run. I set RUN_TYPE to startup, CONTINUE_RUN to FALSE, RESUBMIT to 14 [years], and RUN_STARTDATE to 0001-01-01. The model runs for 1 year then crashes. The error is in the coupler logs as follows:

(seq_infodata_Init) read rpointer file rpointer.drv
(seq_infodata_Init) restart file from rpointer= CAM_chem_SSP1-2.6_2050_maxforest.cpl.r.0002-01-01-00000.nc
(seq_io_read_openfile) ERROR: file invalid
CAM_chem_SSP1-2.6_2050_maxforest.cpl.r.0002-01-01-00000.nc
ERROR: Unknown error submitted to shr_abort_abort.

Now, the restart file 'CAM_chem_SSP1-2.6_2050_maxforest.cpl.r.0002-01-01-00000.nc' does not exist, as the model has for some reason started from the year 0005. This is the last year of a previous run I did with this case. How do I make sure the model starts from 0001-01-01? Do I need to do something with RUN_REFDATE?
Model version is CESM2.2.0, full version info and relevant log files are attached.

Thanks,

James
 

joneill

Joseph O'Neill
Administrator
Staff member
Just testing the upload feature here. Thanks for everyone's patience.
 

Attachments

  • test.txt
    28 bytes · Views: 7

James King

James King
Member
Thanks Joseph. Here are the log files relating to this issue.
 

Attachments

  • version_info.txt
    6.2 KB · Views: 3
  • cesm.log.2256171.chadmin1.ib0.cheyenne.ucar.edu.211231-135729.txt.txt
    361 KB · Views: 6
  • cpl.log.2256171.chadmin1.ib0.cheyenne.ucar.edu.211231-135729.txt.txt
    3.7 KB · Views: 5

jedwards

CSEG and Liaisons
Staff member
So this is the log from the first resubmit correct? I don't see any evidence for your statement "the model has for some reason started from the year 0005."
Is the file CAM_chem_SSP1-2.6_2050_maxforest.cpl.r.0002-01-01-00000.nc in your run directory? Is it in the archive directory? Is it possible there was an error at the end of the previous run that you missed?
 

James King

James King
Member
Hi,

This is the log from the first resubmit after which the model crashed (i.e. the only logs in the run directory). There is no file called 'CAM_chem_SSP1-2.6_2050_maxforest.cpl.r.0002-01-01-00000.nc' in the run directory, but there is in the archive directory from the previous time I ran this case. Maybe the issue here is that I didn't clean out the archive directory before re-running the case (having made a few minor edits to its configuration)?

Best,

James
 

jedwards

CSEG and Liaisons
Staff member
The st_archive process should have left that file in the run directory after the first run completed along with copying it to the archive directory.
 

James King

James King
Member
The st_archive process should have left that file in the run directory after the first run completed along with copying it to the archive directory.
That's what I thought, but the file is missing. Re your previous point, the last time I ran this case it completed without any errors and archived the output. Do the logs give any hints as to what happened in this instance, or would it be best if I ran the model again having cleaned up all the remaining files generated during previous runs of this case (e.g. those in the run directory)?
 

jedwards

CSEG and Liaisons
Staff member
The logs from the initial run and the (queuing system) log from the st_archive process may shed some light on the problem. I guess we need to figure out why that file is missing.
 

James King

James King
Member
Here's the st_archive log.
 

Attachments

  • st_archive.CAM_chem_SSP1-2.6_2050_maxforest.o2252613.txt
    32.6 KB · Views: 5

jedwards

CSEG and Liaisons
Staff member
Okay - It looks like some confusion occurred and the restart files for 0002-01-01 were deleted as intermediates.
This is because restarts from year 6 were found in your run directory. I would suggest repeating from the beginning with
a clean run directory. You might consider a test run of a few days or months to be sure before you commit to another long run - that is change STOP_OPTION from nyears to ndays.
 

James King

James King
Member
Okay - It looks like some confusion occurred and the restart files for 0002-01-01 were deleted as intermediates.
This is because restarts from year 6 were found in your run directory. I would suggest repeating from the beginning with
a clean run directory. You might consider a test run of a few days or months to be sure before you commit to another long run - that is change STOP_OPTION from nyears to ndays.
Thanks, I will give this a try.
 
Top