Repeated WACCM crash in solar maximum time slice employing NRLSSI2

tkruschke@geomar_de · Dec 15, 2015

Hi all!
I am running CESM1(WACCM) as part of the cesm 1.0.6 suite for a set of time slice simulations. The common basis for all experiments is the F_2000_WACCM.
However, I use a different SST/ICE field as lower boundary forcing (1995-2004 mean annual cycle of HadISST1.1) and (this is the important point) different constant spectral solar irradiance forcings.All my simulations run smooth and stable. All except for one which is forced by solar maximum conditions of Nov 1989 according to the new NRLSSI2 (data made available by Judith Lean via personal communication).Certainly, all the forcing files are prepared correctly. Actually, I just copied the default files used by the F_2000_WACCM compset and exchanged the contained data.For SST/ICE this original file was: .../inputdata/ocn/docn7/SSTDATA/sst_HadOIBl_bc_1.9x2.5_clim_c061031.nc
For spectral solar irradiance: .../inputdata/atm/cam/solar/spectral_irradiance_Lean_1610-2009_ann_c100405.nc
And for F10.7, kp, and ap: .../inputdata/atm/waccm/phot/wa_smax_c100517.nc[here and in the following I shorten the absolute paths to the respective files, hopefully nevertheless indicating which file I mean]
As stated above, I am confident that the forcing files are correct. This is fostered by the fact that a total number of 10 time slice experiments (only SSI forcing different from the one simulation crashing) ran completely fine for 50years.Regarding the simulation which is repeatedly crashing after a maximum of 5.5 years, the .out file always tells me:Model did not complete - see .../cpl.log.XXXXXX-XXXXXXHowever, no crash-related info at all is contained in the coupler log. Only the .../cesm.log.XXXXXX-XXXXXX contained some info for my first try:forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source
ccsm.exe           00000000014C1A29 Unknown               Unknown Unknown
ccsm.exe           00000000014C03A0 Unknown               Unknown Unknown
ccsm.exe           000000000147DC02 Unknown               Unknown Unknown
ccsm.exe           000000000140ABA3 Unknown               Unknown Unknown
ccsm.exe           000000000141354B Unknown               Unknown Unknown
libpthread.so.0    00002B602B990500 Unknown               Unknown Unknown
ccsm.exe           00000000008A6C3D gw_drag_mp_gw_dra        2096 gw_drag.F90
ccsm.exe           000000000089BF8F gw_drag_mp_gw_int         809 gw_drag.F90
ccsm.exe           0000000000639362 tphysac_                  263 tphysac.F90
ccsm.exe           0000000000572BF4 physpkg_mp_phys_r         849 physpkg.F90
ccsm.exe           000000000048ED1E cam_comp_mp_cam_r         279 cam_comp.F90
ccsm.exe           000000000047D67A atm_comp_mct_mp_a         528 atm_comp_mct.F90
ccsm.exe           000000000040FEBB ccsm_comp_mod_mp_        2166 ccsm_comp_mod.F90
ccsm.exe           0000000000422F7B MAIN__                     91 ccsm_driver.F90
ccsm.exe           000000000040E296 Unknown               Unknown Unknown
libc.so.6          00002B602BBBDCDD Unknown               Unknown Unknown
ccsm.exe           000000000040E189 Unknown               Unknown Unknown
forrtl: error (69): process interrupted (SIGINT)
+ hundreds of similar linesAfter that I recompiled the model setting DEBUG=TRUE in env_build.xml and INFO_DBUG=3 in env_run.xmlExpectedly, the model crashed again, the resulting message in .../.out pointed again at the coupler log. This coupler log was now full of messages. The content for the timestep where the model crashed gave the following last lines:
comm_diag xxx sorr 35-7.5035924229171823754E+03 send atm Fall_flxdst1
comm_diag xxx sorr 36-4.0277447075338284776E+04 send atm Fall_flxdst2
comm_diag xxx sorr 37-9.4446187010784124141E+04 send atm Fall_flxdst3
comm_diag xxx sorr 38-8.8965352332325681346E+04 send atm Fall_flxdst4Comparing this to other timesteps gives me the feeling that some data receiving should follow now, something like (teken from the timestep before)
comm_diag xxx sorr   1 3.2501741968186116000E+16 recv atm Sa_z
comm_diag xxx sorr   2-4.5541183399918668750E+14 recv atm Sa_u
comm_diag xxx sorr   3 2.7827167220025068750E+14 recv atm Sa_v
...
but this is not happening anymore. Reading the advices to get past WACCM crashes, after that I tried increasing combinations of nspltvrm, nspltrac and nsplit but none of these things helped. In fact, when running the experiment from the beginning instead of restarting the crashes occured even earlier for increased nspltvrm and especially nsplit
The behaviour of the various log-files is always equivalent to my descriptions above.Only in one of my tries I got a core dump file with the crash. Trying to debug this (first time I ever did this) a backtrace gave me the following message (no idea whether this is helpful)
0x000000000040841b at .../ccsm.exe section .text offset 33819 So, does anyone have further hints or ideas how to overcome this crash. I start fearing that WACCM4 is not meant to deal with constant solarmax-forcing of NRLSSI2 :-/Thanks in advance, Tim

jedwards · Dec 15, 2015

Here is what you are looking for, the model crashed at line 2096 of gw_drag.F90. Look there for the field or fields that are out of spec.gw_drag_mp_gw_dra 2096 gw_drag.F90You probably need to reduce the timestep or increase the dynamics substeps to continue.

tkruschke@geomar_de · Dec 16, 2015

Thanks for the very quick reply!Just to make sure that I got you right:
Reducing the dynamics substep would work by increasing "nsplit", wouldn't it? I tried this already (value of 12 instead of 8).Reducing the timestep would work be decreasing "dtime", right?
As far as I understand, this has to be identically set in cam.buildnml.csh and clm.buildnml.cshEdit: OK, I recognized that "ATM_NCPL" in env_conf.xml has to be adjusted as well. Do you have any recommendations of which "dtime" to use then? I would try with 1200 (20 minutes) now, but maybe you have a differing advice.Thanks a lot,Tim

Repeated WACCM crash in solar maximum time slice employing NRLSSI2

tkruschke@geomar_de

New Member

jedwards

CSEG and Liaisons

tkruschke@geomar_de

New Member