Welcome to the new DiscussCESM forum!
We are still working on the website migration, so you may experience downtime during this process.

Existing users, please reset your password before logging in here: https://xenforo.cgd.ucar.edu/cesm/index.php?lost-password/

How to restart CESM2.1.3 (fhist) from a backup

bidyut

BIDYUT BIKASH GOSWAMI
Member
Hi,

I am running CESM2.1.3. with compset "fhist". My run stopped (due to some maintenance activity of the machine) with last restart created on 1997-01-01. Now I wanted to restart the model with RUN_STARTDATE=1997-01-01. These are the specifications in the env_run.xml :

CONTINUE_RUN: TRUE
RUN_REFCASE: f.e20.FHIST.f09_f09.cesm2_1.001_v2
RUN_REFDATE: 1997-01-01
RUN_REFTOD: 00000
RUN_STARTDATE: 1997-01-01
RUN_TYPE: hybrid

And my rpointer files (which are in my $RUNDIR) say:
CTRL213skl.cam.r.1997-01-01-00000.nc
CTRL213skl.cpl.r.1997-01-01-00000.nc
CTRL213skl.cism.r.1997-01-01-00000.nc
./CTRL213skl.cice.r.1997-01-01-00000.nc
./CTRL213skl.clm2.r.1997-01-01-00000.nc
CTRL213skl.docn.r.1997-01-01-00000.nc
CTRL213skl.docn.rs1.1997-01-01-00000.bin
./CTRL213skl.mosart.r.1997-01-01-00000.nc

But my run is failing with error:

(glc_run_mct) ERROR overshot coupling time 20150101 0
19970102 0
ERROR: glc error overshot time

I am not sure if my modifications in the env_run.xml file are correct. Can anyone please point out my mistake in setting the values in my env_run.xml file?

Thank you.
 

nusbaume

Jesse Nusbaumer
CSEG and Liaisons
Staff member
Hi Bidyut,

If you are trying to simply continue a stopped run, then all you should set is CONTINUE_RUN to TRUE, while leaving everything else as it was originally.

If, however, you are trying to create a new hybrid run, as your RUN_TYPE indicates, then you will want to modify all of the date variables like you have done, but set CONTINUE_RUN to FALSE, as a hybrid (or branch) run will be considered a new simulation.

Finally, the actual error you are getting is from the land-ice/CISM model, so I have moved your post to the land ice forum in case there is anything they want to add about your particular "overshot" issue.

Hope that helps, and have a great day!

Jesse
 

katec

CSEG and Liaisons
Staff member
Hi there,

Typically a "glc error overshot time" error occurs because the model is restarting at a time different than 00000, and CISM is running. Your restart pointers and xml fields above indicate that your run is not obviously trying to start from a different time, but there's always the possibility that something is happening that you (we) don't expect. My suggestions are to make sure there is only one set of restart files in your run directory (the restart pointers should point to the correct restarts, but things can get confused when there are multiple restarts in a run directory) and that you are restarting at 00000, and that you change your run_type back to what it was originally.

If you are still having issues, you could create a new case as a branch run with a stub glacier component instead of the active glacier in the FHIST run. This will change answers slightly over Greenland. The new compset should be the same as FHIST long name but with "SGLC" where "CISM%NOEVOLVE" was.
 

bidyut

BIDYUT BIKASH GOSWAMI
Member
Hi Bidyut,

If you are trying to simply continue a stopped run, then all you should set is CONTINUE_RUN to TRUE, while leaving everything else as it was originally.

If, however, you are trying to create a new hybrid run, as your RUN_TYPE indicates, then you will want to modify all of the date variables like you have done, but set CONTINUE_RUN to FALSE, as a hybrid (or branch) run will be considered a new simulation.

Finally, the actual error you are getting is from the land-ice/CISM model, so I have moved your post to the land ice forum in case there is anything they want to add about your particular "overshot" issue.

Hope that helps, and have a great day!

Jesse
Thanks Jesse for your reply. Yes I am just trying to "simply continue a stopped run". When I had tried to simply continue my stopped run, it had started from 1979 instead of the latest restart date. Let me try what you said, once again more carefully. Thanks.
 

bidyut

BIDYUT BIKASH GOSWAMI
Member
Hi there,

Typically a "glc error overshot time" error occurs because the model is restarting at a time different than 00000, and CISM is running. Your restart pointers and xml fields above indicate that your run is not obviously trying to start from a different time, but there's always the possibility that something is happening that you (we) don't expect. My suggestions are to make sure there is only one set of restart files in your run directory (the restart pointers should point to the correct restarts, but things can get confused when there are multiple restarts in a run directory) and that you are restarting at 00000, and that you change your run_type back to what it was originally.

If you are still having issues, you could create a new case as a branch run with a stub glacier component instead of the active glacier in the FHIST run. This will change answers slightly over Greenland. The new compset should be the same as FHIST long name but with "SGLC" where "CISM%NOEVOLVE" was.
Thanks Katec for your reply. As I had mentioned in my reply to Jesse, all I am trying to do is "simply continue a stopped run" from the latest restart date. I shall keep in mind what you said and try more carefully to restart the model before trying it as a new branch run. Thanks.
 

bidyut

BIDYUT BIKASH GOSWAMI
Member
Hi Bidyut,

If you are trying to simply continue a stopped run, then all you should set is CONTINUE_RUN to TRUE, while leaving everything else as it was originally.

If, however, you are trying to create a new hybrid run, as your RUN_TYPE indicates, then you will want to modify all of the date variables like you have done, but set CONTINUE_RUN to FALSE, as a hybrid (or branch) run will be considered a new simulation.

Finally, the actual error you are getting is from the land-ice/CISM model, so I have moved your post to the land ice forum in case there is anything they want to add about your particular "overshot" issue.

Hope that helps, and have a great day!

Jesse
By the way, when you said "while leaving everything else as it was originally", can you please confirm if this is what you meant:

GET_REFCASE: TRUE
RUN_REFCASE: f.e20.FHIST.f09_f09.cesm2_1.001_v2
RUN_REFDATE: 1979-01-01
RUN_REFDIR: cesm2_init
RUN_STARTDATE: 1979-01-01
RUN_REFTOD: 00000
CONTINUE_RUN: TRUE
RUN_TYPE: hybrid

Thanks.
 

bidyut

BIDYUT BIKASH GOSWAMI
Member
Hi All, thank you for your suggestions. I could restart successfully. But after a couple of years of restart, I encountered a different issue. The model crashed due to an error:

ERROR:
component_mod:check_fields NaN found in ATM instance: 1 field Faxa_dstwet3 1
d global index: 28103

I guess it sees some NaN value in atmospheric input file. But can you guide me which input file it can be? I found a similar thread https://bb.cgd.ucar.edu/cesm/thread...-check_fields-nan-found-in-atm-instance.4773/ but not sure if the error is coming from the same file. As I had mentioned before, I am running CESM2.1.3. with compset "fhist".
 

nusbaume

Jesse Nusbaumer
CSEG and Liaisons
Staff member
Hi Bidyut,

When I said:

...leaving everything else as it was originally.

What I meant is that all of the variables should have the same value as they did when you first started the run, except for CONTINUE_RUN, which needs to be set to TRUE. So you would need to undo all of the other changes that you listed.

It sounds like you have already done that? If not then I would retry your restart with all of your other modifications removed and see if you still get this error (which is explicitly stating that the wet deposition of dust has a NaN somewhere). If so then I will move this thread back to the CAM forums.

Also, if you receive this NaN error again, can attach your atm log and cesm log, along with any XML files you modified (or SourceMods you added)? That could also help us debug the issue.

Thanks, and have a great day!

Jesse
 

bidyut

BIDYUT BIKASH GOSWAMI
Member
Hi Jesse, Thank you for your reply. I think there was some memory issue in the machine. I re-submitted the job and the simulation successfully completed.

Thanks for your help.
Bidyut
 
Top