Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

CESM simulation stops at the start of the second month

jinmuluo

Jinmu Luo
Member
My CESM code version: CESM3_beta06


Hi,

I run a CESM simulation with CAM-Chem and CLM on, and I did see my case successfully output the first month results, but my case stops at the second month, which is very weird.

Here is the my case location on derecho: /glade/derecho/scratch/jinmuluo/O3_Crop_soil_nox_on/run

And in the CaseStatus, it says

2025-11-11 17:38:54: case.run starting 3589111.desched1
---------------------------------------------------
2025-11-11 17:38:58: model execution starting 3589111.desched1
---------------------------------------------------
2025-11-11 22:30:54: model execution success 3589111.desched1
---------------------------------------------------
2025-11-11 22:30:54: case.run error
ERROR: Model did not complete - see /glade/derecho/scratch/jinmuluo/O3_Crop_soil_nox_on/run/med.log.3589111.desched1.251111-173854

But nothing wrong in the med.log.3589111.desched1.251111-173854, and no obvious error can be detected in the cesm.log

best.

Jinmu
 

oleson

Keith Oleson
CSEG and Liaisons
Staff member
I see this in your cesm log:

dec1952.hsn.de.hpc.ucar.edu 1016: forrtl: severe (408): fort: (2): Subscript #1 of the array HISTO has value 21 which is greater than the upper bound of 20
dec1952.hsn.de.hpc.ucar.edu 1016:
dec1952.hsn.de.hpc.ucar.edu 1016: Image PC Routine Line Source
dec1952.hsn.de.hpc.ucar.edu 1016: cesm.exe 00000000092BA9DD histfilemod_mp_hf 3734 histFileMod.F90
dec1952.hsn.de.hpc.ucar.edu 1016: cesm.exe 00000000092D4E18 histfilemod_mp_hi 4259 histFileMod.F90
dec1952.hsn.de.hpc.ucar.edu 1016: cesm.exe 0000000008F349DB clm_driver_mp_clm 1449 clm_driver.F90
dec1952.hsn.de.hpc.ucar.edu 1016: cesm.exe 0000000008E4AF39 lnd_comp_nuopc_mp 899 lnd_comp_nuopc.F90

The lines near 3734 in histFileMod.F90 are:

if (numdims == 1) then
allocate(hist1do(beg1d_out:end1d_out), stat=ier)
if (ier /= 0) then
write(iulog,*) trim(subname),' ERROR: allocation'
call endrun(msg=errMsg(sourcefile, __LINE__))
end if
hist1do(beg1d_out:end1d_out) = histo(beg1d_out:end1d_out,1)
end if

It looks like the model is dying trying to write out the h2 CLM history file. One of your requested history fields must be causing problems. I see from your lnd_in that you are requesting a bunch of history fields at the column level for the h2 file. Sometimes requesting a variable at a given level that doesn't exist can cause unexpected problems. For example, requesting a variable at the column level when it doesn't exist at the column level, e.g., it is a grid level variable only, is a problem.
To troubleshoot, I'd look at the variables you've requested for the h2 file, particularly any new variables you've added.
It looks like the model was successful at writing out the h0 and h1 files...
 
Vote Upvote 0 Downvote

jinmuluo

Jinmu Luo
Member
I see this in your cesm log:

dec1952.hsn.de.hpc.ucar.edu 1016: forrtl: severe (408): fort: (2): Subscript #1 of the array HISTO has value 21 which is greater than the upper bound of 20
dec1952.hsn.de.hpc.ucar.edu 1016:
dec1952.hsn.de.hpc.ucar.edu 1016: Image PC Routine Line Source
dec1952.hsn.de.hpc.ucar.edu 1016: cesm.exe 00000000092BA9DD histfilemod_mp_hf 3734 histFileMod.F90
dec1952.hsn.de.hpc.ucar.edu 1016: cesm.exe 00000000092D4E18 histfilemod_mp_hi 4259 histFileMod.F90
dec1952.hsn.de.hpc.ucar.edu 1016: cesm.exe 0000000008F349DB clm_driver_mp_clm 1449 clm_driver.F90
dec1952.hsn.de.hpc.ucar.edu 1016: cesm.exe 0000000008E4AF39 lnd_comp_nuopc_mp 899 lnd_comp_nuopc.F90

The lines near 3734 in histFileMod.F90 are:

if (numdims == 1) then
allocate(hist1do(beg1d_out:end1d_out), stat=ier)
if (ier /= 0) then
write(iulog,*) trim(subname),' ERROR: allocation'
call endrun(msg=errMsg(sourcefile, __LINE__))
end if
hist1do(beg1d_out:end1d_out) = histo(beg1d_out:end1d_out,1)
end if

It looks like the model is dying trying to write out the h2 CLM history file. One of your requested history fields must be causing problems. I see from your lnd_in that you are requesting a bunch of history fields at the column level for the h2 file. Sometimes requesting a variable at a given level that doesn't exist can cause unexpected problems. For example, requesting a variable at the column level when it doesn't exist at the column level, e.g., it is a grid level variable only, is a problem.
To troubleshoot, I'd look at the variables you've requested for the h2 file, particularly any new variables you've added.
It looks like the model was successful at writing out the h0 and h1 files...
Hi Keith,

/glade/derecho/scratch/jinmuluo/O3_Crop_soil_nox_on/run

I deleted some variables in the h2 file might not in the column level, but this time I still meet the same error. I have a case with a land model only can successfully output these variables. Would it be because of the compset issue?

successful case with the same variables in column-level output: /glade/derecho/scratch/jinmuluo/soil_nox_hist_f09_mg17/run
 
Vote Upvote 0 Downvote

oleson

Keith Oleson
CSEG and Liaisons
Staff member
Ok, that was a good idea to try a land-only with the same output request.
The subscript error is referring to beg1d_out:end1d_out which is the per-processor 1d beginning and ending indices (I think it is the number of gridcells per processor). Somehow the histo variable is expecting a dimension of size 20, but the indices indicate there should be a dimension of size 21. I'm not sure why that would be. Pinging @erik and @slevis in case they have any ideas.
I guess I would first try simplifying your output to see if you can just get a default monthly history file (h0) and then build back in your other output requests piece by piece. Otherwise, you may have to add some write statements to histFileMod.F90 to debug.
 
Vote Upvote 0 Downvote

oleson

Keith Oleson
CSEG and Liaisons
Staff member
Actually, I think this variable, CROPPROD1N_LOSS is a gridcell level variable and you are requesting it at the column level:

this%cropprod1_loss_grc(begg:endg) = spval
call hist_addfld1d( &
fname = this%species%hist_fname('CROPPROD1', suffix='_LOSS'), &
units = 'g' // this%species%get_species() // '/m^2/s', &
avgflag = 'A', &
long_name = 'loss from 1-yr crop product pool', &
ptr_gcell = this%cropprod1_loss_grc, default=active_if_non_isotope)

I'm not sure why you wouldn't get the same or similar error in land-only mode...
 
Vote Upvote 0 Downvote

oleson

Keith Oleson
CSEG and Liaisons
Staff member
Ok, I guess this: fname = this%species%hist_fname('CROPPROD1', suffix='_LOSS') must be adding an "N" (or a "C") after CROPROD1.
So I do think that this variable could be a problem when you ask for it at column level.
 
Vote Upvote 0 Downvote

jinmuluo

Jinmu Luo
Member
Ok, I guess this: fname = this%species%hist_fname('CROPPROD1', suffix='_LOSS') must be adding an "N" (or a "C") after CROPROD1.
So I do think that this variable could be a problem when you ask for it at column level.
Hi Keith,

Thank you so your careful look at the code and for giving me some advice! I don't know why, in the land mode only mode, I did output this 'CROPPROD1N_LOSS' at the column level. Anyway, I will try your suggestions first to see if its indeed this variable that caused the trouble.

/glade/derecho/scratch/jinmuluo/archive/soil_nox_hist_f09_mg17/lnd/hist/soil_nox_hist_f09_mg17.clm2.h4.2020-02.nc


float CROPPROD1N_LOSS(time, column) ;

CROPPROD1N_LOSS:long_name = "loss from 1-yr crop product pool" ;

CROPPROD1N_LOSS:units = "gN/m^2/s" ;

CROPPROD1N_LOSS:cell_methods = "time: mean" ;

CROPPROD1N_LOSS:_FillValue = 1.e+36f ;

CROPPROD1N_LOSS:missing_value = 1.e+36f ;
 
Vote Upvote 0 Downvote

jinmuluo

Jinmu Luo
Member
Ok, I guess this: fname = this%species%hist_fname('CROPPROD1', suffix='_LOSS') must be adding an "N" (or a "C") after CROPROD1.
So I do think that this variable could be a problem when you ask for it at column level.
Hi Keith ,

now the model stops at the end of the second month, not sure does it make sense to you?


20251114 171834.134 ERROR PET1020 ESMCI_Calendar.C:1059 ESMCI::Calendar::convertToTime() Input argument out of range - ; Gregorian: for February 2010, dd=29 > 28 days in the month.
20251114 171834.135 ERROR PET1020 ESMCI_Time.C:333 ESMCI::Time::set() Input argument out of range - Internal subroutine call returned Error
20251114 171834.135 ERROR PET1020 ESMF_Time.F90:1385 ESMF_TimeSetDefault() Input argument out of range - Internal subroutine call returned Error
20251114 171834.135 ERROR PET1020 CHKRC
20251114 171834.590 INFO PET1020 Finalizing ESMF with endflag==ESMF_END_ABORT
20251114 171834.590 ERROR PET1020 /glade/derecho/scratch/csgteam/temp/spack/derecho/24.12/builds/spack-stage-esmf-8.8.0-ypx5ao4unezqxatt7vhrq5cyvrcn67xv/spack-src/src/Infrastructure/Trace/src/ESMCI_Trace.C:1816 ESMCI::TraceEventRegionExit() Wrong argument specified - Trace regions not properly nested exiting from region: [ESMF] Expected exit from: physpkg_st1
20251114 171834.590 ERROR PET1020 /glade/derecho/scratch/csgteam/temp/spack/derecho/24.12/builds/spack-stage-esmf-8.8.0-ypx5ao4unezqxatt7vhrq5cyvrcn67xv/spack-src/src/Infrastructure/Trace/src/ESMCI_Trace.C:1258 ESMCI::TraceClose() Wrong argument specified - Internal subroutine call returned Error
20251114 171834.590 ERROR PET1020 ESMF_Trace.F90:102 ESMF_TraceClose() Wrong argument specified - Internal subroutine call returned Error
~
 
Vote Upvote 0 Downvote

oleson

Keith Oleson
CSEG and Liaisons
Staff member
You are running year 2016, which is a leap year. Model is crashing trying to run the leap day. The error indicates some data from 2010 is being used, which is not a leap year. Maybe some CAM data?
 
Vote Upvote 0 Downvote

jinmuluo

Jinmu Luo
Member
You are running year 2016, which is a leap year. Model is crashing trying to run the leap day. The error indicates some data from 2010 is being used, which is not a leap year. Maybe some CAM data?
I guess because I cycled the emission inventory in 2010, any method I can apply to avoid this issue?

nl cam srf_emis_type "'CYCLICAL'"
nl cam srf_emis_cycle_yr 2010
nl cam flbc_type "'CYCLICAL'"
nl cam flbc_cycle_yr 2010
 
Vote Upvote 0 Downvote
Top