Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Problem with restarting model

EmilyVanDeKoot

Emily Van de koot
New Member
Hi, I am trying to do a two-year test run which restarts at the end of the first year. I am using CESM2.1 with the land component replaced by SLIM (Home · marysa/SimpleLand Wiki), a slab ocean and CAM5. I am running the model on the ARCHER2 supercomputer. The model successfully completes the first year but then fails to restart (note I am able to run much longer simulations if I don't restart the model).

From the cesm.log, I can see the model fails to restart because it is unable to find the last monthly output file of the first year (I output both monthly and daily data): "ERROR: GETFIL: FAILED to get two_years_updated_model.cam.h0.0001-12.nc". I have had a look, and this file exists in the directory "work/n02/n02/emilyvdk/cesm_data/archive/two_years_updated_model/atm/hist/" along with all the other output data. The restart file is in ".../cesm_data/runs/two_years_updated_model/run/" and is called "two_years_updated_model.cam.r.0002-01-01-00000.nc".

I have attached my cesm.log, env_run.xml, user_nl_cam, user_nl_clm, user_docn.streams.txt.som, submission script and .xml files in case they are helpful. I would be very grateful for any advice.
 

Attachments

  • cesm.log.560146.211008-225820.txt
    176 KB · Views: 2
  • config_batch.txt
    22.1 KB · Views: 1
  • config_compilers.txt
    40.2 KB · Views: 0
  • config_machines.txt
    97.7 KB · Views: 2
  • env_run.txt
    59.8 KB · Views: 0
  • SLIM_submission_script.txt
    1.1 KB · Views: 3
  • user_nl_clm.txt
    4.8 KB · Views: 2
  • user_nl_cam.txt
    416 bytes · Views: 2
  • user_docn.streams.txt.som.txt
    967 bytes · Views: 1

erik

Erik Kluzek
CSEG and Liaisons
Staff member
The problem is that file needs to be copied to the run directory. The short term archiver should leave a copy in place, but something must not be working right in your port and it doesn't leave a copy around. So a simple work around would be to copy it back in by hand.
 

EmilyVanDeKoot

Emily Van de koot
New Member
Hi Erik,

Thanks for getting back to me! I copied the files by hand and this allowed the model to restart. However I would like the model to be able to restart automatically as there is a limit on the run time for an individual job on the supercomputer I am using. Therefore I was wondering you know what might be wrong with my port?

Since posting I have changed some lines in my config_machines.xml file (shown below) but this has not resolved the issue. I have also made sure that these change are reflected in the env_run.xml file.

<CIME_OUTPUT_ROOT>/work/n02/n02/emilyvdk/cesm_data/scratch/$USER</CIME_OUTPUT_ROOT>

<DOUT_S_ROOT>/work/n02/n02/emilyvdk/cesm_data/scratch/$USER/archive/$CASE</DOUT_S_ROOT>

Many thanks,
Emily
 

slevis

Moderator
Staff member
In your case's env_run.xml, change DOUT_S_SAVE_INTERIM_RESTART_FILES to TRUE.

This will not hurt anything and may solve your problem. Let us know if it does.
 

cpatrizio

Casey Patrizio
New Member
Hello, I am also experiencing this problem. @zhangmiexin, how did you fix it exactly? I have already loaded netcdf before running the job.
 
Top