Dear CESM Support and CAM6 Users,
I am running CAM6 simulations using meteorological data interpolated from MERRA2 (72 levels) to the CAM6 model grid (32 levels) via vertical interpolation scripts (GeoCAT). My simulations for the years 2005 and 2023 complete successfully using the same setup and interpolation workflow. However, the simulation for 2006 consistently crashes during initialization, and 2024 runs intermittently—sometimes crashing, sometimes succeeding.
For 2006, the model terminates with an ESMF stack trace error involving libesmf.so, without producing any detailed CAM error message or traceback. This suggests the issue may relate to initialization, threading, or corrupted input.
Here’s what I’ve checked so far:
Verified the interpolated meteorological files for 2006 are complete and structurally identical to those from 2005 and 2023.
Ran with --threads 1 and on 20 Derecho nodes to rule out threading or memory issues.
Confirmed that time, dimensions, and variable headers are consistent across all years.
Used the same vertical interpolation method successfully applied to other years.
The crash occurs shortly after reading the PS field during initialization:
INFLD_REAL_2D_2D: read field PS
READ_NEXT_PS: Read meteorological data
This is followed by an abrupt ESMF error with no further logging.
My suspicion is that either 2006 has a corrupted or subtly malformed field (e.g., PS, T, Q), or that ESMF is encountering instability due to memory/threading interactions specific to this year’s input.
Questions for the Community:
Has anyone encountered CAM6 crashes like this tied to a specific year or day, despite similar interpolation workflows working for other years?
Could there be hidden issues in the time axis, fill values, or metadata that escape normal ncdump checks but trip up CAM6?
Are there known ESMF/libesmf.so issues that can be triggered by specific forcing file conditions?
Technical Setup Summary:
Model: CESM2.2 with CAM6_3_128
Resolution: f19_f19_mg17
Input Forcing: Vertically interpolated MERRA2
Years tested: 2005
, 2006
, 2023
, 2024
(inconsistent)
Machine: Derecho
Run Options: --threads 1, 20 nodes (initially I used default dercho 4 nodes, then 8 nodes, then 10 nodes, finally 20)
Any suggestions or experiences with similar issues would be greatly appreciated.
I’d also be happy to share snippets of my input file headers or logs if that helps others debug.
Thanks in advance for your support and insights.
Please look the following case directory and all its run in cesm and atm log file and more based on you expertise.
dksingh@derecho3:/glade/derecho/scratch/dksingh/06_control_startup/run> tail cesm.log.9587298.desched1.250521-010938
dec2036.hsn.de.hpc.ucar.edu 935: libesmf.so 0000153A637D4792 enter 2321 ESMCI_VMKernel.C
dec1758.hsn.de.hpc.ucar.edu 675: libesmf.so 0000154D8673FBA8 ESMCI_FTableCallE 824 ESMCI_FTable.C
dec2036.hsn.de.hpc.ucar.edu 935: libesmf.so 0000153A637BDE70 enter 1216 ESMCI_VM.C
dec1758.hsn.de.hpc.ucar.edu 675: libesmf.so 0000154D86BC8792 enter 2321 ESMCI_VMKernel.C
dec2036.hsn.de.hpc.ucar.edu 935: libesmf.so 0000153A6334CF4F c_esmc_ftablecall 981 ESMCI_FTable.C
dec1758.hsn.de.hpc.ucar.edu 675:
dec2036.hsn.de.hpc.ucar.edu 935: libesmf.so 0000153A63E233B8 esmf_compmod_mp_e 1223 ESMF_Comp.F90
dec1758.hsn.de.hpc.ucar.edu 675: Stack trace terminated abnormally.
dec2036.hsn.de.hpc.ucar.edu 935:
dec2036.hsn.de.hpc.ucar.edu 935: Stack trace terminated abnormally.
dksingh@derecho3:/glade/derecho/scratch/dksingh/06_control_startup/run> tail atm.log.9587298.desched1.250521-010938
INFLD_REAL_2D_2D: read field QFLX
INFLD_REAL_2D_2D: read field TAUX
INFLD_REAL_2D_2D: read field TAUY
INFLD_REAL_2D_2D: read field TS
INFLD_REAL_2D_2D: read field SST
INFLD_REAL_2D_2D: read field ICEFRAC
READ_NEXT_METDATA: Read meteorological data
INFLD_REAL_2D_2D: read field PS
INFLD_REAL_2D_2D: read field PS
READ_NEXT_PS: Read meteorological data
dksingh@derecho3:/glade/derecho/scratch/dksingh/06_control_startup/run> tail atm.log.9563515.desched1.250519-092049
2.873563218390813E-002
-----------------------------------
do_press_fix_llnl: dpress_g = 269.731764992269
do_press_fix_llnl: dpress_g = 269.731764992269
nstep, te 761 0.21789444314978194E+10 0.21794113481848497E+10 0.25939115270269181E-01 0.98289772824383763E+05 0.22552395239472389E+03
-----------------------------------
photo_timestep_init: diagnostics
calday, last, next, dels = 16.8541666666667 1 2
2.945402298850567E-002
-----------------------------------
dksingh@derecho3:/glade/derecho/scratch/dksingh/06_control_startup/run> tail atm.log.9525465.desched1.250515-065111
2.873563218390813E-002
-----------------------------------
do_press_fix_llnl: dpress_g = 269.731764992269
do_press_fix_llnl: dpress_g = 269.731764992269
nstep, te 761 0.21789444314978194E+10 0.21794113481848497E+10 0.25939115270269181E-01 0.98289772824383763E+05 0.22552395239472389E+03
-----------------------------------
photo_timestep_init: diagnostics
calday, last, next, dels = 16.8541666666667 1 2
2.945402298850567E-002
-----------------------------------
dksingh@derecho3:/glade/derecho/scratch/dksingh/06_control_startup/run> tail atm.log.9498971.desched1.250512-051102
17.nc
open_met_datafile:
/glade/derecho/scratch/dksingh/2006/interpolated//interp_MERRA2_0.9x1.25_200601
17.nc
INFLD_REAL_2D_2D: read field PS
INFLD_REAL_2D_2D: read field PS
READ_NEXT_PS: Read meteorological data
do_press_fix_llnl: dpress_g = 269.670514201235
do_press_fix_llnl: dpress_g = 269.670514201235
nstep, te 762 0.21789410850100784E+10 0.21794077004771843E+10 0.25922381275663660E-01 0.98289772848527806E+05 0.22552395239472389E+03
I am running CAM6 simulations using meteorological data interpolated from MERRA2 (72 levels) to the CAM6 model grid (32 levels) via vertical interpolation scripts (GeoCAT). My simulations for the years 2005 and 2023 complete successfully using the same setup and interpolation workflow. However, the simulation for 2006 consistently crashes during initialization, and 2024 runs intermittently—sometimes crashing, sometimes succeeding.
For 2006, the model terminates with an ESMF stack trace error involving libesmf.so, without producing any detailed CAM error message or traceback. This suggests the issue may relate to initialization, threading, or corrupted input.
Here’s what I’ve checked so far:
Verified the interpolated meteorological files for 2006 are complete and structurally identical to those from 2005 and 2023.
Ran with --threads 1 and on 20 Derecho nodes to rule out threading or memory issues.
Confirmed that time, dimensions, and variable headers are consistent across all years.
Used the same vertical interpolation method successfully applied to other years.
The crash occurs shortly after reading the PS field during initialization:
INFLD_REAL_2D_2D: read field PS
READ_NEXT_PS: Read meteorological data
This is followed by an abrupt ESMF error with no further logging.
My suspicion is that either 2006 has a corrupted or subtly malformed field (e.g., PS, T, Q), or that ESMF is encountering instability due to memory/threading interactions specific to this year’s input.
Questions for the Community:
Has anyone encountered CAM6 crashes like this tied to a specific year or day, despite similar interpolation workflows working for other years?
Could there be hidden issues in the time axis, fill values, or metadata that escape normal ncdump checks but trip up CAM6?
Are there known ESMF/libesmf.so issues that can be triggered by specific forcing file conditions?
Technical Setup Summary:
Model: CESM2.2 with CAM6_3_128
Resolution: f19_f19_mg17
Input Forcing: Vertically interpolated MERRA2
Years tested: 2005




Machine: Derecho
Run Options: --threads 1, 20 nodes (initially I used default dercho 4 nodes, then 8 nodes, then 10 nodes, finally 20)
Any suggestions or experiences with similar issues would be greatly appreciated.
I’d also be happy to share snippets of my input file headers or logs if that helps others debug.
Thanks in advance for your support and insights.
Please look the following case directory and all its run in cesm and atm log file and more based on you expertise.
dksingh@derecho3:/glade/derecho/scratch/dksingh/06_control_startup/run> tail cesm.log.9587298.desched1.250521-010938
dec2036.hsn.de.hpc.ucar.edu 935: libesmf.so 0000153A637D4792 enter 2321 ESMCI_VMKernel.C
dec1758.hsn.de.hpc.ucar.edu 675: libesmf.so 0000154D8673FBA8 ESMCI_FTableCallE 824 ESMCI_FTable.C
dec2036.hsn.de.hpc.ucar.edu 935: libesmf.so 0000153A637BDE70 enter 1216 ESMCI_VM.C
dec1758.hsn.de.hpc.ucar.edu 675: libesmf.so 0000154D86BC8792 enter 2321 ESMCI_VMKernel.C
dec2036.hsn.de.hpc.ucar.edu 935: libesmf.so 0000153A6334CF4F c_esmc_ftablecall 981 ESMCI_FTable.C
dec1758.hsn.de.hpc.ucar.edu 675:
dec2036.hsn.de.hpc.ucar.edu 935: libesmf.so 0000153A63E233B8 esmf_compmod_mp_e 1223 ESMF_Comp.F90
dec1758.hsn.de.hpc.ucar.edu 675: Stack trace terminated abnormally.
dec2036.hsn.de.hpc.ucar.edu 935:
dec2036.hsn.de.hpc.ucar.edu 935: Stack trace terminated abnormally.
dksingh@derecho3:/glade/derecho/scratch/dksingh/06_control_startup/run> tail atm.log.9587298.desched1.250521-010938
INFLD_REAL_2D_2D: read field QFLX
INFLD_REAL_2D_2D: read field TAUX
INFLD_REAL_2D_2D: read field TAUY
INFLD_REAL_2D_2D: read field TS
INFLD_REAL_2D_2D: read field SST
INFLD_REAL_2D_2D: read field ICEFRAC
READ_NEXT_METDATA: Read meteorological data
INFLD_REAL_2D_2D: read field PS
INFLD_REAL_2D_2D: read field PS
READ_NEXT_PS: Read meteorological data
dksingh@derecho3:/glade/derecho/scratch/dksingh/06_control_startup/run> tail atm.log.9563515.desched1.250519-092049
2.873563218390813E-002
-----------------------------------
do_press_fix_llnl: dpress_g = 269.731764992269
do_press_fix_llnl: dpress_g = 269.731764992269
nstep, te 761 0.21789444314978194E+10 0.21794113481848497E+10 0.25939115270269181E-01 0.98289772824383763E+05 0.22552395239472389E+03
-----------------------------------
photo_timestep_init: diagnostics
calday, last, next, dels = 16.8541666666667 1 2
2.945402298850567E-002
-----------------------------------
dksingh@derecho3:/glade/derecho/scratch/dksingh/06_control_startup/run> tail atm.log.9525465.desched1.250515-065111
2.873563218390813E-002
-----------------------------------
do_press_fix_llnl: dpress_g = 269.731764992269
do_press_fix_llnl: dpress_g = 269.731764992269
nstep, te 761 0.21789444314978194E+10 0.21794113481848497E+10 0.25939115270269181E-01 0.98289772824383763E+05 0.22552395239472389E+03
-----------------------------------
photo_timestep_init: diagnostics
calday, last, next, dels = 16.8541666666667 1 2
2.945402298850567E-002
-----------------------------------
dksingh@derecho3:/glade/derecho/scratch/dksingh/06_control_startup/run> tail atm.log.9498971.desched1.250512-051102
17.nc
open_met_datafile:
/glade/derecho/scratch/dksingh/2006/interpolated//interp_MERRA2_0.9x1.25_200601
17.nc
INFLD_REAL_2D_2D: read field PS
INFLD_REAL_2D_2D: read field PS
READ_NEXT_PS: Read meteorological data
do_press_fix_llnl: dpress_g = 269.670514201235
do_press_fix_llnl: dpress_g = 269.670514201235
nstep, te 762 0.21789410850100784E+10 0.21794077004771843E+10 0.25922381275663660E-01 0.98289772848527806E+05 0.22552395239472389E+03