Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Reducing the timestep to avoid numerical instability

I am experiencing a similar problem with a segmentation fault as discussed in the previous thread. I am running waccm_16_cam3_5_48 in CCMVal REFB1 configuration on an IBM Power6 System at the DKRZ in Hamburg on one node.

The model ran without problems for 5 years and suddenly stops on January 15th with a numerical instability. To get the model over this point I would like to try to reduce the timestep. Therefore I have set dtime to 900 instead of 1800 in the "use-case" namelist.

Apparently the model does not like the change in timestep as it is different from the model timestep of the last restart files and is producing the following message:

0: Successfully initialized the land model
0: begin continuation run at:
0: nstep= 105121 year= 1964 month= 1 day= 1 seconds= 1800
0:
0:************************************************************
0:
3: clm dtime 1800 and Eclock dtime 900 never align
3: ENDRUN:lnd_init_mct ERROR: time out of sync
6: clm dtime 1800 and Eclock dtime 900 never align
6: ENDRUN:lnd_init_mct ERROR: time out of sync
0: dtime_sync= 900 dtime_clm= 1800 mod = 900
0: clm dtime 1800 and Eclock dtime 900 never align
0: ENDRUN:lnd_init_mct ERROR: time out of sync
4: clm dtime 1800 and Eclock dtime 900 never align
4: ENDRUN:lnd_init_mct ERROR: time out of sync
5: clm dtime 1800 and Eclock dtime 900 never align
5: ENDRUN:lnd_init_mct ERROR: time out of sync
7: clm dtime 1800 and Eclock dtime 900 never align
7: ENDRUN:lnd_init_mct ERROR: time out of sync
2: clm dtime 1800 and Eclock dtime 900 never align
2: ENDRUN:lnd_init_mct ERROR: time out of sync
7:
7: Traceback:
7: Offset 0x00000010 in procedure xl__trbk_


I was trying to make a branch run instead of a continuation run but I cannot find the right place to overwrite the namelist.

I included the following line in the use-case namelist for my specific run:
branch

but the start_type in drv_in is still "continue" instead of "branch".


Could somebody please help?

1. how to do a branch run for waccm16_cam3_5_48?
2. how to reduce the timestep for the fv WACCM version?

I only want to reduce the timestep for one month or so and increase it afterwards again, hoping that the instability problem will not happen so quickly again.....
 
There is more than one way to do a branch run. If you have a directory (untarred) with the restart files you want to use, the simplest may be to modify your run script. In the script, first set a variable containing the path to your restart files directory, i.e.:

====

set restart_files_dir = /ptmp/username/refb1.4_restart_files

====

Then in the section of the script just before building the namelist, set the runtype to branch and copy your restart files to your rundir:

====

## GENERAL ARCHIVING SETTINGS
setenv ARCH_CASE $case #casename - required

if (-e $cnt) then
@ N = `cat $cnt` || echo "cat $cnt failed" && exit 1
if ( $N < $limit ) then
if ( $N == 0 ) then
set runtype = branch
cp $restart_files_dir/* $rundir
else
set runtype = continue
endif

## Create the namelist files
cd $blddir || echo "cd $blddir failed" && exit 1

====

If you this doesn't work for you, there is another method I can post.
 
There are 3 namelist files where the timestep is set. dtime is the name used in atm_in and lnd_in. The third is atm_cpl_dt in drv_init_in, which is the timestep for the coupler. You will probably want to change all three.
 
no, unfortunately this file is not in the same root directory. However, I set atm_cpl_dt in the use-case namelist and it now appears in the drv_in namelist. All timesteps are now set to 900s (dtime, atm_cpl_dt, lnd_cpl_dt, ocn_cpl_dt, ice_cpl_dt) but when I now do a branch run I still get the error message that the new timestep does not align with the timestep of the restart files:

0: hist_htapes_build Successfully initialized clm2 history files
0:------------------------------------------------------------
0: Successfully initialized the land model
0: begin continuation run at:
0: nstep= 105121 year= 1964 month= 1 day= 1 seconds= 1800
0:
0:************************************************************
0:
0: dtime_sync= 900 dtime_clm= 1800 mod = 900
0: clm dtime 1800 and Eclock dtime 900 never align
0: ENDRUN:lnd_init_mct ERROR: time out of sync
2: clm dtime 1800 and Eclock dtime 900 never align
2: ENDRUN:lnd_init_mct ERROR: time out of sync
3: clm dtime 1800 and Eclock dtime 900 never align
3: ENDRUN:lnd_init_mct ERROR: time out of sync
4: clm dtime 1800 and Eclock dtime 900 never align
4: ENDRUN:lnd_init_mct ERROR: time out of sync
5: clm dtime 1800 and Eclock dtime 900 never align
5: ENDRUN:lnd_init_mct ERROR: time out of sync
6: clm dtime 1800 and Eclock dtime 900 never align
6: ENDRUN:lnd_init_mct ERROR: time out of sync
7: clm dtime 1800 and Eclock dtime 900 never align
7: ENDRUN:lnd_init_mct ERROR: time out of sync
0:
0: Traceback:


I wonder what is going on. Do I need this other (drv_init_in) namelist and how do I get this?

At least the described method how to do a branch run works for me :)
But one has to specify the clm2 restart file with "nrevsn" as well as the restart pointer: rpointer.drv.

Thanks a lot in advance!
 

eaton

CSEG and Liaisons
I'm afraid you can't do a branch run and change the timestep. CAM (and possibly other components) will read dtime from the restart file and ignore the namelist setting. From the CAM perspective, branches allow you to change the output streams, but not anything that changes the characteristics of the simulation itself. To change the simulation CAM requires an initial run. In the CCSM context you need to do a "hybrid" run which implies an initial run for both the CAM and CLM components.
 
Top