Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

How to restart CESM2.1.3 (fhist) from a backup

bidyut

BIDYUT BIKASH GOSWAMI
Member
Hi,

I am running CESM2.1.3. with compset "fhist". My run stopped (due to some maintenance activity of the machine) with last restart created on 1997-01-01. Now I wanted to restart the model with RUN_STARTDATE=1997-01-01. These are the specifications in the env_run.xml :

CONTINUE_RUN: TRUE
RUN_REFCASE: f.e20.FHIST.f09_f09.cesm2_1.001_v2
RUN_REFDATE: 1997-01-01
RUN_REFTOD: 00000
RUN_STARTDATE: 1997-01-01
RUN_TYPE: hybrid

And my rpointer files (which are in my $RUNDIR) say:
CTRL213skl.cam.r.1997-01-01-00000.nc
CTRL213skl.cpl.r.1997-01-01-00000.nc
CTRL213skl.cism.r.1997-01-01-00000.nc
./CTRL213skl.cice.r.1997-01-01-00000.nc
./CTRL213skl.clm2.r.1997-01-01-00000.nc
CTRL213skl.docn.r.1997-01-01-00000.nc
CTRL213skl.docn.rs1.1997-01-01-00000.bin
./CTRL213skl.mosart.r.1997-01-01-00000.nc

But my run is failing with error:

(glc_run_mct) ERROR overshot coupling time 20150101 0
19970102 0
ERROR: glc error overshot time

I am not sure if my modifications in the env_run.xml file are correct. Can anyone please point out my mistake in setting the values in my env_run.xml file?

Thank you.
 

nusbaume

Jesse Nusbaumer
CSEG and Liaisons
Staff member
Hi Bidyut,

If you are trying to simply continue a stopped run, then all you should set is CONTINUE_RUN to TRUE, while leaving everything else as it was originally.

If, however, you are trying to create a new hybrid run, as your RUN_TYPE indicates, then you will want to modify all of the date variables like you have done, but set CONTINUE_RUN to FALSE, as a hybrid (or branch) run will be considered a new simulation.

Finally, the actual error you are getting is from the land-ice/CISM model, so I have moved your post to the land ice forum in case there is anything they want to add about your particular "overshot" issue.

Hope that helps, and have a great day!

Jesse
 

katec

CSEG and Liaisons
Staff member
Hi there,

Typically a "glc error overshot time" error occurs because the model is restarting at a time different than 00000, and CISM is running. Your restart pointers and xml fields above indicate that your run is not obviously trying to start from a different time, but there's always the possibility that something is happening that you (we) don't expect. My suggestions are to make sure there is only one set of restart files in your run directory (the restart pointers should point to the correct restarts, but things can get confused when there are multiple restarts in a run directory) and that you are restarting at 00000, and that you change your run_type back to what it was originally.

If you are still having issues, you could create a new case as a branch run with a stub glacier component instead of the active glacier in the FHIST run. This will change answers slightly over Greenland. The new compset should be the same as FHIST long name but with "SGLC" where "CISM%NOEVOLVE" was.
 

bidyut

BIDYUT BIKASH GOSWAMI
Member
Hi Bidyut,

If you are trying to simply continue a stopped run, then all you should set is CONTINUE_RUN to TRUE, while leaving everything else as it was originally.

If, however, you are trying to create a new hybrid run, as your RUN_TYPE indicates, then you will want to modify all of the date variables like you have done, but set CONTINUE_RUN to FALSE, as a hybrid (or branch) run will be considered a new simulation.

Finally, the actual error you are getting is from the land-ice/CISM model, so I have moved your post to the land ice forum in case there is anything they want to add about your particular "overshot" issue.

Hope that helps, and have a great day!

Jesse
Thanks Jesse for your reply. Yes I am just trying to "simply continue a stopped run". When I had tried to simply continue my stopped run, it had started from 1979 instead of the latest restart date. Let me try what you said, once again more carefully. Thanks.
 

bidyut

BIDYUT BIKASH GOSWAMI
Member
Hi there,

Typically a "glc error overshot time" error occurs because the model is restarting at a time different than 00000, and CISM is running. Your restart pointers and xml fields above indicate that your run is not obviously trying to start from a different time, but there's always the possibility that something is happening that you (we) don't expect. My suggestions are to make sure there is only one set of restart files in your run directory (the restart pointers should point to the correct restarts, but things can get confused when there are multiple restarts in a run directory) and that you are restarting at 00000, and that you change your run_type back to what it was originally.

If you are still having issues, you could create a new case as a branch run with a stub glacier component instead of the active glacier in the FHIST run. This will change answers slightly over Greenland. The new compset should be the same as FHIST long name but with "SGLC" where "CISM%NOEVOLVE" was.
Thanks Katec for your reply. As I had mentioned in my reply to Jesse, all I am trying to do is "simply continue a stopped run" from the latest restart date. I shall keep in mind what you said and try more carefully to restart the model before trying it as a new branch run. Thanks.
 

bidyut

BIDYUT BIKASH GOSWAMI
Member
Hi Bidyut,

If you are trying to simply continue a stopped run, then all you should set is CONTINUE_RUN to TRUE, while leaving everything else as it was originally.

If, however, you are trying to create a new hybrid run, as your RUN_TYPE indicates, then you will want to modify all of the date variables like you have done, but set CONTINUE_RUN to FALSE, as a hybrid (or branch) run will be considered a new simulation.

Finally, the actual error you are getting is from the land-ice/CISM model, so I have moved your post to the land ice forum in case there is anything they want to add about your particular "overshot" issue.

Hope that helps, and have a great day!

Jesse
By the way, when you said "while leaving everything else as it was originally", can you please confirm if this is what you meant:

GET_REFCASE: TRUE
RUN_REFCASE: f.e20.FHIST.f09_f09.cesm2_1.001_v2
RUN_REFDATE: 1979-01-01
RUN_REFDIR: cesm2_init
RUN_STARTDATE: 1979-01-01
RUN_REFTOD: 00000
CONTINUE_RUN: TRUE
RUN_TYPE: hybrid

Thanks.
 

bidyut

BIDYUT BIKASH GOSWAMI
Member
Hi All, thank you for your suggestions. I could restart successfully. But after a couple of years of restart, I encountered a different issue. The model crashed due to an error:

ERROR:
component_mod:check_fields NaN found in ATM instance: 1 field Faxa_dstwet3 1
d global index: 28103

I guess it sees some NaN value in atmospheric input file. But can you guide me which input file it can be? I found a similar thread https://bb.cgd.ucar.edu/cesm/thread...-check_fields-nan-found-in-atm-instance.4773/ but not sure if the error is coming from the same file. As I had mentioned before, I am running CESM2.1.3. with compset "fhist".
 

nusbaume

Jesse Nusbaumer
CSEG and Liaisons
Staff member
Hi Bidyut,

When I said:

...leaving everything else as it was originally.

What I meant is that all of the variables should have the same value as they did when you first started the run, except for CONTINUE_RUN, which needs to be set to TRUE. So you would need to undo all of the other changes that you listed.

It sounds like you have already done that? If not then I would retry your restart with all of your other modifications removed and see if you still get this error (which is explicitly stating that the wet deposition of dust has a NaN somewhere). If so then I will move this thread back to the CAM forums.

Also, if you receive this NaN error again, can attach your atm log and cesm log, along with any XML files you modified (or SourceMods you added)? That could also help us debug the issue.

Thanks, and have a great day!

Jesse
 

bidyut

BIDYUT BIKASH GOSWAMI
Member
Hi Jesse, Thank you for your reply. I think there was some memory issue in the machine. I re-submitted the job and the simulation successfully completed.

Thanks for your help.
Bidyut
 

ucas_qs

qiushi Zhang
Member
Hi,

I am running CESM2.1.3. with compset "fhist". My run stopped (due to some maintenance activity of the machine) with last restart created on 1997-01-01. Now I wanted to restart the model with RUN_STARTDATE=1997-01-01. These are the specifications in the env_run.xml :

CONTINUE_RUN: TRUE
RUN_REFCASE: f.e20.FHIST.f09_f09.cesm2_1.001_v2
RUN_REFDATE: 1997-01-01
RUN_REFTOD: 00000
RUN_STARTDATE: 1997-01-01
RUN_TYPE: hybrid

And my rpointer files (which are in my $RUNDIR) say:
CTRL213skl.cam.r.1997-01-01-00000.nc
CTRL213skl.cpl.r.1997-01-01-00000.nc
CTRL213skl.cism.r.1997-01-01-00000.nc
./CTRL213skl.cice.r.1997-01-01-00000.nc
./CTRL213skl.clm2.r.1997-01-01-00000.nc
CTRL213skl.docn.r.1997-01-01-00000.nc
CTRL213skl.docn.rs1.1997-01-01-00000.bin
./CTRL213skl.mosart.r.1997-01-01-00000.nc

But my run is failing with error:

(glc_run_mct) ERROR overshot coupling time 20150101 0
19970102 0
ERROR: glc error overshot time

I am not sure if my modifications in the env_run.xml file are correct. Can anyone please point out my mistake in setting the values in my env_run.xml file?

Thank you.
Hello, I am currently experiencing the same problem as you. I want to use the restart file of one experiment as the initial field of another experiment. The setting is the same as yours, but the same error occurs. How did you solve it?
 

bidyut

BIDYUT BIKASH GOSWAMI
Member
Hi Qiushi, thanks for your message. As I mentioned above, "I think there was some memory issue in the machine". So the run successfully completed when I resubmitted it.

Nonetheless, I did run a branch run where I had used restart file of one experiment (CTRL in my case) as initial files for another.

This is what I used:

./xmlchange RUN_TYPE=branch
./xmlchange RUN_REFCASE=CTRL => This is the case name of the mother simulation
./xmlchange RUN_REFDATE=0036-01-01 =>This is the end date of CTRL
./xmlchange GET_REFCASE=FALSE
./xmlchange RUN_STARTDATE=0036-01-01 =>This is the end date of CTRL

Hope this helps.
Best,
Bidyut
 

ucas_qs

qiushi Zhang
Member
Hi Qiushi, thanks for your message. As I mentioned above, "I think there was some memory issue in the machine". So the run successfully completed when I resubmitted it.

Nonetheless, I did run a branch run where I had used restart file of one experiment (CTRL in my case) as initial files for another.

This is what I used:

./xmlchange RUN_TYPE=branch
./xmlchange RUN_REFCASE=CTRL => This is the case name of the mother simulation
./xmlchange RUN_REFDATE=0036-01-01 =>This is the end date of CTRL
./xmlchange GET_REFCASE=FALSE
./xmlchange RUN_STARTDATE=0036-01-01 =>This is the end date of CTRL

Hope this helps.
Best,
Bidyut
Hi Bidyut. Thanks for your helpful reply.
So far I have found a new problem, I have run two experiments, EXP1 and EXP2. EXP1 runs for 50 years and then 10 years, for a total of 60 years. I set the restart file from EXP1 0051-01-01 as the initial field of EXP2 (RUN_TYPE=hybrid). Except for this setup, EXP2 was set up exactly like the EXP1 and ran successfully for 10 years. I thought EXP1.0051-01.nc and EXP2.0001-01.nc (RUN_STARTDATE=0001-01-01) would have the same result, but in fact their SST (sea surface temperature) are different. In addition, an El Nino event appeared in EXP1 0051-0051 (maximum SSTA in January and approximately 0 SSTA in September), but the development of this event in EXP2 was not the same (maximum SSTA in January and approximately 0 SSTA in July). I want to know, why is there such a difference in the results of these two experiments?
 

bidyut

BIDYUT BIKASH GOSWAMI
Member
Hi Qiushi,
Good to know that you could branch your EXP2 from EXP1.

Regarding the differences in your result, it is beyond what I know since it is your experiment. Please make sure you are sure about what you are comparing. Ideally, EXP1.0051-01.nc and EXP2.0001-01.nc should be similar (to my knowledge) ... you need to carefully investigate why they are not. Sorry I could not help you understand the difference between EXP1.0051-01.nc and EXP2.0001-01.nc.
Good luck,
Bidyut
 
Top