CESM simulation stops at the start of the second month

jinmuluo · Nov 12, 2025

My CESM code version: CESM3_beta06

Hi,

I run a CESM simulation with CAM-Chem and CLM on, and I did see my case successfully output the first month results, but my case stops at the second month, which is very weird.

Here is the my case location on derecho: /glade/derecho/scratch/jinmuluo/O3_Crop_soil_nox_on/run

And in the CaseStatus, it says

2025-11-11 17:38:54: case.run starting 3589111.desched1
---------------------------------------------------
2025-11-11 17:38:58: model execution starting 3589111.desched1
---------------------------------------------------
2025-11-11 22:30:54: model execution success 3589111.desched1
---------------------------------------------------
2025-11-11 22:30:54: case.run error
ERROR: Model did not complete - see /glade/derecho/scratch/jinmuluo/O3_Crop_soil_nox_on/run/med.log.3589111.desched1.251111-173854

But nothing wrong in the med.log.3589111.desched1.251111-173854, and no obvious error can be detected in the cesm.log

best.

Jinmu

oleson · Nov 12, 2025

I see this in your cesm log:

dec1952.hsn.de.hpc.ucar.edu 1016: forrtl: severe (408): fort: (2): Subscript #1 of the array HISTO has value 21 which is greater than the upper bound of 20
dec1952.hsn.de.hpc.ucar.edu 1016:
dec1952.hsn.de.hpc.ucar.edu 1016: Image PC Routine Line Source
dec1952.hsn.de.hpc.ucar.edu 1016: cesm.exe 00000000092BA9DD histfilemod_mp_hf 3734 histFileMod.F90
dec1952.hsn.de.hpc.ucar.edu 1016: cesm.exe 00000000092D4E18 histfilemod_mp_hi 4259 histFileMod.F90
dec1952.hsn.de.hpc.ucar.edu 1016: cesm.exe 0000000008F349DB clm_driver_mp_clm 1449 clm_driver.F90
dec1952.hsn.de.hpc.ucar.edu 1016: cesm.exe 0000000008E4AF39 lnd_comp_nuopc_mp 899 lnd_comp_nuopc.F90

The lines near 3734 in histFileMod.F90 are:

if (numdims == 1) then
allocate(hist1do(beg1d_out:end1d_out), stat=ier)
if (ier /= 0) then
write(iulog,*) trim(subname),' ERROR: allocation'
call endrun(msg=errMsg(sourcefile, __LINE__))
end if
hist1do(beg1d_out:end1d_out) = histo(beg1d_out:end1d_out,1)
end if

It looks like the model is dying trying to write out the h2 CLM history file. One of your requested history fields must be causing problems. I see from your lnd_in that you are requesting a bunch of history fields at the column level for the h2 file. Sometimes requesting a variable at a given level that doesn't exist can cause unexpected problems. For example, requesting a variable at the column level when it doesn't exist at the column level, e.g., it is a grid level variable only, is a problem.
To troubleshoot, I'd look at the variables you've requested for the h2 file, particularly any new variables you've added.
It looks like the model was successful at writing out the h0 and h1 files...

jinmuluo · Nov 13, 2025

oleson said:
I see this in your cesm log:

dec1952.hsn.de.hpc.ucar.edu 1016: forrtl: severe (408): fort: (2): Subscript #1 of the array HISTO has value 21 which is greater than the upper bound of 20
dec1952.hsn.de.hpc.ucar.edu 1016:
dec1952.hsn.de.hpc.ucar.edu 1016: Image PC Routine Line Source
dec1952.hsn.de.hpc.ucar.edu 1016: cesm.exe 00000000092BA9DD histfilemod_mp_hf 3734 histFileMod.F90
dec1952.hsn.de.hpc.ucar.edu 1016: cesm.exe 00000000092D4E18 histfilemod_mp_hi 4259 histFileMod.F90
dec1952.hsn.de.hpc.ucar.edu 1016: cesm.exe 0000000008F349DB clm_driver_mp_clm 1449 clm_driver.F90
dec1952.hsn.de.hpc.ucar.edu 1016: cesm.exe 0000000008E4AF39 lnd_comp_nuopc_mp 899 lnd_comp_nuopc.F90

The lines near 3734 in histFileMod.F90 are:

if (numdims == 1) then
allocate(hist1do(beg1d_out:end1d_out), stat=ier)
if (ier /= 0) then
write(iulog,*) trim(subname),' ERROR: allocation'
call endrun(msg=errMsg(sourcefile, __LINE__))
end if
hist1do(beg1d_out:end1d_out) = histo(beg1d_out:end1d_out,1)
end if

It looks like the model is dying trying to write out the h2 CLM history file. One of your requested history fields must be causing problems. I see from your lnd_in that you are requesting a bunch of history fields at the column level for the h2 file. Sometimes requesting a variable at a given level that doesn't exist can cause unexpected problems. For example, requesting a variable at the column level when it doesn't exist at the column level, e.g., it is a grid level variable only, is a problem.
To troubleshoot, I'd look at the variables you've requested for the h2 file, particularly any new variables you've added.
It looks like the model was successful at writing out the h0 and h1 files...

Hi Keith,

/glade/derecho/scratch/jinmuluo/O3_Crop_soil_nox_on/run

I deleted some variables in the h2 file might not in the column level, but this time I still meet the same error. I have a case with a land model only can successfully output these variables. Would it be because of the compset issue?

successful case with the same variables in column-level output: /glade/derecho/scratch/jinmuluo/soil_nox_hist_f09_mg17/run

oleson · Nov 13, 2025

Ok, that was a good idea to try a land-only with the same output request.
The subscript error is referring to beg1d_out:end1d_out which is the per-processor 1d beginning and ending indices (I think it is the number of gridcells per processor). Somehow the histo variable is expecting a dimension of size 20, but the indices indicate there should be a dimension of size 21. I'm not sure why that would be. Pinging @erik and @slevis in case they have any ideas.
I guess I would first try simplifying your output to see if you can just get a default monthly history file (h0) and then build back in your other output requests piece by piece. Otherwise, you may have to add some write statements to histFileMod.F90 to debug.

oleson · Nov 13, 2025

Actually, I think this variable, CROPPROD1N_LOSS is a gridcell level variable and you are requesting it at the column level:

this%cropprod1_loss_grc(begg:endg) = spval
call hist_addfld1d( &
fname = this%species%hist_fname('CROPPROD1', suffix='_LOSS'), &
units = 'g' // this%species%get_species() // '/m^2/s', &
avgflag = 'A', &
long_name = 'loss from 1-yr crop product pool', &
ptr_gcell = this%cropprod1_loss_grc, default=active_if_non_isotope)

I'm not sure why you wouldn't get the same or similar error in land-only mode...

oleson · Nov 13, 2025

Well I guess that is CROPPROD1_LOSS, not CROPPROD1N_LOSS. Are you defining CROPPROD1N_LOSS somewhere in the code, I don't see it...

oleson · Nov 13, 2025

Ok, I guess this: fname = this%species%hist_fname('CROPPROD1', suffix='_LOSS') must be adding an "N" (or a "C") after CROPROD1.
So I do think that this variable could be a problem when you ask for it at column level.

jinmuluo · Nov 13, 2025

oleson said:
Ok, I guess this: fname = this%species%hist_fname('CROPPROD1', suffix='_LOSS') must be adding an "N" (or a "C") after CROPROD1.
So I do think that this variable could be a problem when you ask for it at column level.

Hi Keith,

Thank you so your careful look at the code and for giving me some advice! I don't know why, in the land mode only mode, I did output this 'CROPPROD1N_LOSS' at the column level. Anyway, I will try your suggestions first to see if its indeed this variable that caused the trouble.

/glade/derecho/scratch/jinmuluo/archive/soil_nox_hist_f09_mg17/lnd/hist/soil_nox_hist_f09_mg17.clm2.h4.2020-02.nc

float CROPPROD1N_LOSS(time, column) ;

CROPPROD1N_LOSS:long_name = "loss from 1-yr crop product pool" ;

CROPPROD1N_LOSS:units = "gN/m^2/s" ;

CROPPROD1N_LOSS:cell_methods = "time: mean" ;

CROPPROD1N_LOSS:_FillValue = 1.e+36f ;

CROPPROD1N_LOSS:missing_value = 1.e+36f ;

jinmuluo · Nov 14, 2025

oleson said:
Ok, I guess this: fname = this%species%hist_fname('CROPPROD1', suffix='_LOSS') must be adding an "N" (or a "C") after CROPROD1.
So I do think that this variable could be a problem when you ask for it at column level.

Hi Keith ,

now the model stops at the end of the second month, not sure does it make sense to you?

20251114 171834.134 ERROR PET1020 ESMCI_Calendar.C:1059 ESMCI::Calendar::convertToTime() Input argument out of range - ; Gregorian: for February 2010, dd=29 > 28 days in the month.
20251114 171834.135 ERROR PET1020 ESMCI_Time.C:333 ESMCI::Time::set() Input argument out of range - Internal subroutine call returned Error
20251114 171834.135 ERROR PET1020 ESMF_Time.F90:1385 ESMF_TimeSetDefault() Input argument out of range - Internal subroutine call returned Error
20251114 171834.135 ERROR PET1020 CHKRC
20251114 171834.590 INFO PET1020 Finalizing ESMF with endflag==ESMF_END_ABORT
20251114 171834.590 ERROR PET1020 /glade/derecho/scratch/csgteam/temp/spack/derecho/24.12/builds/spack-stage-esmf-8.8.0-ypx5ao4unezqxatt7vhrq5cyvrcn67xv/spack-src/src/Infrastructure/Trace/src/ESMCI_Trace.C:1816 ESMCI::TraceEventRegionExit() Wrong argument specified - Trace regions not properly nested exiting from region: [ESMF] Expected exit from: physpkg_st1
20251114 171834.590 ERROR PET1020 /glade/derecho/scratch/csgteam/temp/spack/derecho/24.12/builds/spack-stage-esmf-8.8.0-ypx5ao4unezqxatt7vhrq5cyvrcn67xv/spack-src/src/Infrastructure/Trace/src/ESMCI_Trace.C:1258 ESMCI::TraceClose() Wrong argument specified - Internal subroutine call returned Error
20251114 171834.590 ERROR PET1020 ESMF_Trace.F90:102 ESMF_TraceClose() Wrong argument specified - Internal subroutine call returned Error
~

oleson · Nov 15, 2025

You are running year 2016, which is a leap year. Model is crashing trying to run the leap day. The error indicates some data from 2010 is being used, which is not a leap year. Maybe some CAM data?

jinmuluo · Nov 15, 2025

oleson said:
You are running year 2016, which is a leap year. Model is crashing trying to run the leap day. The error indicates some data from 2010 is being used, which is not a leap year. Maybe some CAM data?

I guess because I cycled the emission inventory in 2010, any method I can apply to avoid this issue?

nl cam srf_emis_type "'CYCLICAL'"
nl cam srf_emis_cycle_yr 2010
nl cam flbc_type "'CYCLICAL'"
nl cam flbc_cycle_yr 2010

oleson · Nov 15, 2025

I think you'll have to ask the CAM people what options are available for that stream.

katec · Dec 3, 2025

Looking at this error, it doesn't seem like an issue with the cyclical forcing. It just looks like a problem with the calendars between your components. You could try running without a leap year, as that is the more standard method for CESM.
In your case directory, type

> ./xmlchange CALENDAR=NO_LEAP

And see if that helps.

CESM simulation stops at the start of the second month

jinmuluo

Jinmu Luo

Member

oleson

Keith Oleson

CSEG and Liaisons

jinmuluo

Jinmu Luo

Member

oleson

Keith Oleson

CSEG and Liaisons

oleson

Keith Oleson

CSEG and Liaisons

oleson

Keith Oleson

CSEG and Liaisons

oleson

Keith Oleson

CSEG and Liaisons

jinmuluo

Jinmu Luo

Member

jinmuluo

Jinmu Luo

Member

oleson

Keith Oleson

CSEG and Liaisons

jinmuluo

Jinmu Luo

Member

oleson

Keith Oleson

CSEG and Liaisons

katec

CSEG and Liaisons