Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

restart file error: error reading variable, more than max-value-size

kmcmonigal

Kay
New Member
I am attempting to branch a hybrid BSSP370smbb forced run of cesm2.1.4-rc.08 from historically forced restart files that included modifications to the ocean-atmosphere coupling. The run will restart from the CMIP6 restart files, but not from restart files from my modified BHISTsmbb run. I swapped out each restart file individually and determined that my cpl.r.2015-01-01-00000.nc restart file is the problematic one.

I altered the user_nl_clm file to include:
use_init_interp = .true.
use_c13 = .false.
use_c14 = .false.
glacier_region_behavior = 'single_at_atm_topo', 'virtual', 'virtual', 'multiple'

Those changes fixed other problems that had occurred, but I am unsure what to do now. The error message suggests that some variables in the restart file are too large to be read. The model initializes and gives some pop output, but fails during a 5 day run.

Section of cesm.log error message:
271:MPT: #5 <signal handler called>
271:MPT: #6 0x0000000006dec40f in soiltemperaturemod::soilthermprop (bounds=...,
271:MPT: num_nolakec=311, filter_nolakec=...,
271:MPT: tk=<error reading variable: value requires 327968 bytes, which is more than max-value-size>,
271:MPT: cv=<error reading variable: value requires 327968 bytes, which is more than max-value-size>, tk_h2osfc=..., urbanparams_inst=..., temperature_inst=...,
271:MPT: waterstate_inst=..., soilstate_inst=...)
 

Attachments

  • version_info.txt
    4.5 KB · Views: 1
  • user_nl_clm.txt
    1.8 KB · Views: 0
  • atm.log.5905895.chadmin1.ib0.cheyenne.ucar.edu.220825-135021.txt
    373 KB · Views: 1
  • glc.log.5905895.chadmin1.ib0.cheyenne.ucar.edu.220825-135021.txt
    18.9 KB · Views: 0
  • ice.log.5905895.chadmin1.ib0.cheyenne.ucar.edu.220825-135021.txt
    37.2 KB · Views: 0
  • lnd.log.5905895.chadmin1.ib0.cheyenne.ucar.edu.220825-135021.txt
    191.9 KB · Views: 2
  • ocn.log.5905895.chadmin1.ib0.cheyenne.ucar.edu.220825-135021.txt
    684.7 KB · Views: 0
  • rof.log.5905895.chadmin1.ib0.cheyenne.ucar.edu.220825-135021.txt
    13.4 KB · Views: 0
  • wav.log.5905895.chadmin1.ib0.cheyenne.ucar.edu.220825-135021.txt
    2.3 KB · Views: 0
  • cesm.log.5905895.chadmin1.ib0.cheyenne.ucar.edu.220825-135021.tail.txt
    92.8 KB · Views: 4

oleson

Keith Oleson
CSEG and Liaisons
Staff member
One of our software engineers points out that:

"The portion of the log file that the user posted is not the relevant part: I think that just indicates a problem printing all of the variable information in the traceback. If you look at the full attached cesm log file, you'll see a floating point exception at /glade/work/kmcmonigal/tmp_aug2022/cesm2.1.4-rc.08/components/clm/src/biogeophys/SoilTemperatureMod.F90:718

which is

bw(c,j) = (h2osoi_ice(c,j)+h2osoi_liq(c,j))/(frac_sno(c)*dz(c,j))"

Presumably either frac_sno or dz is zero here.
I myself also see a very large snow balance error at nstep=0 in the lnd log file:

WARNING: snow balance error
nstep= 0 local indexc= 1092 col%itype= 401
lun%itype= 4 errh2osno= 9894.09545224555

These balance errors don't stop the model in the first few time steps, but we don't usually see such a large balance error in the log file even at the beginning of the model run.
col%itype = 401 is a landice multiple elevation class. The size of the snow balance error might indicate that there is some inconsistency between the snow variables in the restart file (e.g., frac_sno is zero but h2osno is some large value) or in the interpolated file.
I guess you could start by looking at some of the variables that go into the balance check (in BalanceCheckMod.F90) to see if there is anything unusual there:

write(iulog,*)'errh2osno = ',errh2osno(indexc)
write(iulog,*)'snl = ',col%snl(indexc)
write(iulog,*)'snow_depth = ',snow_depth(indexc)
write(iulog,*)'frac_sno_eff = ',frac_sno_eff(indexc)
write(iulog,*)'h2osno = ',h2osno(indexc)
write(iulog,*)'h2osno_old = ',h2osno_old(indexc)
write(iulog,*)'snow_sources = ',snow_sources(indexc)*dtime
write(iulog,*)'snow_sinks = ',snow_sinks(indexc)*dtime
write(iulog,*)'qflx_prec_grnd = ',qflx_prec_grnd(indexc)*dtime
write(iulog,*)'qflx_snow_grnd_col = ',qflx_snow_grnd_col(indexc)*dtime
write(iulog,*)'qflx_rain_grnd_col = ',qflx_rain_grnd_col(indexc)*dtime
write(iulog,*)'qflx_sub_snow = ',qflx_sub_snow(indexc)*dtime
write(iulog,*)'qflx_snow_drain = ',qflx_snow_drain(indexc)*dtime
write(iulog,*)'qflx_evap_grnd = ',qflx_evap_grnd(indexc)*dtime
write(iulog,*)'qflx_top_soil = ',qflx_top_soil(indexc)*dtime
write(iulog,*)'qflx_dew_snow = ',qflx_dew_snow(indexc)*dtime
write(iulog,*)'qflx_dew_grnd = ',qflx_dew_grnd(indexc)*dtime
write(iulog,*)'qflx_snwcp_ice = ',qflx_snwcp_ice(indexc)*dtime
write(iulog,*)'qflx_snwcp_liq = ',qflx_snwcp_liq(indexc)*dtime
write(iulog,*)'qflx_snwcp_discarded_ice = ',qflx_snwcp_discarded_ice(indexc)*dtime
write(iulog,*)'qflx_snwcp_discarded_liq = ',qflx_snwcp_discarded_liq(indexc)*dtime
write(iulog,*)'qflx_sl_top_soil = ',qflx_sl_top_soil(indexc)*dtime

I'm not sure why the coupler restart file would be involved here. Maybe try a different restart file for the clm initial conditions.
 

oleson

Keith Oleson
CSEG and Liaisons
Staff member
I guess you've tried a different initial condition, from the CMIP6 historical, which worked.
I see that the initial file you are using appears to have been generated from a simulation which used a later version of CLM than the version you are using for the current simulation, so maybe there is some backward incompatibility. Otherwise, your setup looks ok to me.
I'll take a closer look later this week.
 

kmcmonigal

Kay
New Member
Thanks Keith. I will try another different initial condition from our altered historical runs (this is for an ensemble so we have several). Agreed backwards incompatibility could be an issue
 
Top