Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Problem branching off of CESM2 CMIP6 PI using newer code base: landunit size inconsistency

Status
Not open for further replies.

aherring

Adam
Member
I've been trying to debug an issue for Eleanor, a postdoc in Jen Kay's group. She is trying to branch off of the CESM2 CMIP6 PI control at year 501. The restarts are here:

/glade/p/cesmdata/inputdata/cesm2_init/b.e21.B1850.f09_g17.CMIP6-piControl.001/0501-01-01/

The source code she is using is here: /glade/work/eleanorm/models/cesm2_1_3/ ... the externals point to release-clm5.0.30 code base.

When she runs a B1850 compset, w/ the same grid alias used in the CMIP6 run (f09_g17). It fails:

/glade/scratch/eleanorm/B1850_clockoutput_v2.1.3_y501-503/run/lnd.log.1804910.chadmin1.ib0.cheyenne.ucar.edu.200417-094831
Reading restart file
b.e21.B1850.f09_g17.CMIP6-piControl.001.clm2.r.0501-01-01-00000.nc
Reading restart dataset
check_dim ERROR: mismatch of input dimension 50827 with expected value
50591 for variable landunit

The dimension size "landunit" is 50827 in the CMIP6 clm restart file, and apparently in conflict w/ her run. I noticed that the fsrudat file used in the CMIP runs were different from the ones defaulted to in her code base.

default: /glade/p/cesmdata/cseg/inputdata/lnd/clm2/surfdata_map/release-clm5.0.18/surfdata_0.9x1.25_hist_78pfts_CMIP6_simyr1850_c190214.nc
CMIP6: /glade/p/cesmdata/cseg/inputdata/lnd/clm2/surfdata_map/surfdata_0.9x1.25_78pfts_CMIP6_simyr1850_c170824.nc

an ncdiff shows that most stuff is zero, but there were non-zero's for PCT_WETLAND. I was *guessing* that the landunit variable is the total number of subgrid landunts in the domain, and so I figured with a different fsurdat file, it may have a different landunit size. So she ran her codebase w/ the CMIP6 fsurdat file instead (from her lnd_in: fsurdat = '/glade/p/cesmdata/cseg/inputdata/lnd/clm2/surfdata_map/surfdata_0.9x1.25_78pfts_CMIP6_simyr1850_c170824.nc'):

/glade/scratch/eleanorm/B1850_clockoutput_y501-503/run/lnd.log.1781780.chadmin1.ib0.cheyenne.ucar.edu.200416-130250
Reading restart file
b.e21.B1850.f09_g17.CMIP6-piControl.001.clm2.r.0501-01-01-00000.nc
Reading restart dataset
check_dim ERROR: mismatch of input dimension 50827 with expected value
50525 for variable landunit

While this did change the landunit size, it is still not equal to the size used in the CMIP6 restart file. At this point, I'm at the limit of knowledge of CTSM, and am hoping the real experts can jump in and help debug this issue. Thanks!
 

oleson

Keith Oleson
CSEG and Liaisons
Staff member
It's possible that this is due to the virtual glacier landunits being on in the piControl, but not in Eleanor's simulation.
Try putting this into the user_nl_clm:

glacier_region_behavior = 'single_at_atm_topo','virtual','virtual','virtual'
 

elmiddlemas

Eleanor
New Member
Hi Keith and Adam,

Sorry for the delayed response. Just as a recap, I've been trying to branch from year 501 (arbitrarily chosen) from the CMIP6 preindustrial control simulation. I found the restarts here (/glade/p/cesmdata/inputdata/cesm2_init/b.e21.B1850.f09_g17.CMIP6-piControl.001/0501-01-01/). I'm using version 2.1.3 with a B1850 compset and resolution f09_g17.

How to resolve this landunit error from branched CESM2 simulations remains a mystery.

Unfortunately that solution Keith suggested yielded the same error (you may find the error log located here: /glade/scratch/eleanorm/B1850_clockoutput_y501-503/run/lnd.log.2002267.chadmin1.ib0.cheyenne.ucar.edu.200429-160252).

I am collaborating with graduate student Anne Sledd on this issue as well, and she has tried numerous fixes, including:
(1) following suggestions here: https://bb.cgd.ucar.edu/cesm/threads/issues-setting-up-hybrid-run.5074/, she set CLM_NAMELIST_OPTS in env_build.xml to init_interp_method='general'
(2) setting use_init_interp in the user_nl_clm to 'true'
(3) setting use_init_interp to 'false'.
(4) using CESM2.1.1 instead of v2.1.3.
All of which resulted in the same landunit dimension error.

Simply changing the runtype to hybrid and leaving the user_nl_clm file blank results in the model crashing during land model initialization (/glade/scratch/eleanorm/B1850_clockoutput_v2.1.3_y501-503_hybrid/run/lnd.log.2062946.chadmin1.ib0.cheyenne.ucar.edu.200504-143112).

The problem is only resolved when I use a hybrid runtype and additionally change the user_nl_clm file following the suggestions posed above by Keith & Adam (i.e., specifying fsurdat & glacier_region_behavior). I haven't tried either fix separately with a hybrid runtype.


Am I missing something? Should we be using CESM2.1.0? It would be really nice to be able to branch from the long preindustrial simulation, as it is a great spun-up, unforced experimental baseline, and there are so many restarts publicly available.


Thanks,
Eleanor M.
 

oleson

Keith Oleson
CSEG and Liaisons
Staff member
I tried a hybrid with cesm2.1.3 out of the box with your configuration and just removed the setting of init_interp_method= 'use_finidat_areas' in env_run.xml and that ran fine.
I'm not quite sure why you can't run a branch using the old surface dataset, but you probably wouldn't want to do that anyway, as the newer surface dataset fixes a bug whereby Antarctica's ice shelves were being treated as wetlands rather than glaciers (a wetland beneath the snowpack). See this issue:


I don't think it's a big deal to run a hybrid instead of a branch since CLM will be initialized in nearly the same way and all the other components should be initialized the same way in both cases.
I'll look at the branch issue in detail further when I get a chance. I can't think of any code in the newer version of the model that would cause problems.
 

oleson

Keith Oleson
CSEG and Liaisons
Staff member
I tried a branch with the old surface dataset and got the same landunit error you did. I then added in the virtual landunits in user_nl_clm and it got past the landunit error. But I now get this error:

Reading restart file
b.e21.B1850.f09_g17.CMIP6-piControl.001.clm2.r.0501-01-01-00000.nc
Reading restart dataset
can't find rootfr in restart (or initial) file...
Initialize rootfr to default
ncd_inqvid: variable xsmrpool_loss is not on dataset
ncd_inqvid: variable xsmrpool_loss is not on dataset
ERROR: Field missing from restart file: xsmrpool_loss
Missing fields are not allowed in branch or continue (restart) runs.

The variable xsmrpool_loss was added in release-clm5.0.16 to fix a different bug.

So I think your best options are to either run with older code, or run a hybrid with the newer code.
 

elmiddlemas

Eleanor
New Member
Good to know. By the way, what older code would work? v2.1.0?

Also, just to clarify, which is the "old surface dataset" and the "new surface dataset"?

Thanks for helping us debug, Keith. This is very informative!
 

oleson

Keith Oleson
CSEG and Liaisons
Staff member
I think 2.1.0 would work because it uses release-clm5.0.14, which is before both of the bug fixes, but I haven't tried it.

This is the old surface dataset:

fsurdat = '/glade/p/cesmdata/cseg/inputdata/lnd/clm2/surfdata_map/surfdata_0.9x1.25_78pfts_CMIP6_simyr1850_c170824.nc'

This is the new surface dataset:

fsurdat = '/glade/p/cesmdata/cseg/inputdata/lnd/clm2/surfdata_map/release-clm5.0.18/surfdata_0.9x1.25_hist_78pfts_CMIP6_simyr1850_c190214.nc'
 
Status
Not open for further replies.
Top