tkruschke@geomar_de
New Member
Hi all!
I am running CESM1(WACCM) as part of the cesm 1.0.6 suite for a set of time slice simulations. The common basis for all experiments is the F_2000_WACCM.
However, I use a different SST/ICE field as lower boundary forcing (1995-2004 mean annual cycle of HadISST1.1) and (this is the important point) different constant spectral solar irradiance forcings.All my simulations run smooth and stable. All except for one which is forced by solar maximum conditions of Nov 1989 according to the new NRLSSI2 (data made available by Judith Lean via personal communication).Certainly, all the forcing files are prepared correctly. Actually, I just copied the default files used by the F_2000_WACCM compset and exchanged the contained data.For SST/ICE this original file was: .../inputdata/ocn/docn7/SSTDATA/sst_HadOIBl_bc_1.9x2.5_clim_c061031.nc
For spectral solar irradiance: .../inputdata/atm/cam/solar/spectral_irradiance_Lean_1610-2009_ann_c100405.nc
And for F10.7, kp, and ap: .../inputdata/atm/waccm/phot/wa_smax_c100517.nc[here and in the following I shorten the absolute paths to the respective files, hopefully nevertheless indicating which file I mean]
As stated above, I am confident that the forcing files are correct. This is fostered by the fact that a total number of 10 time slice experiments (only SSI forcing different from the one simulation crashing) ran completely fine for 50years.Regarding the simulation which is repeatedly crashing after a maximum of 5.5 years, the .out file always tells me:Model did not complete - see .../cpl.log.XXXXXX-XXXXXXHowever, no crash-related info at all is contained in the coupler log. Only the .../cesm.log.XXXXXX-XXXXXX contained some info for my first try:forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
ccsm.exe 00000000014C1A29 Unknown Unknown Unknown
ccsm.exe 00000000014C03A0 Unknown Unknown Unknown
ccsm.exe 000000000147DC02 Unknown Unknown Unknown
ccsm.exe 000000000140ABA3 Unknown Unknown Unknown
ccsm.exe 000000000141354B Unknown Unknown Unknown
libpthread.so.0 00002B602B990500 Unknown Unknown Unknown
ccsm.exe 00000000008A6C3D gw_drag_mp_gw_dra 2096 gw_drag.F90
ccsm.exe 000000000089BF8F gw_drag_mp_gw_int 809 gw_drag.F90
ccsm.exe 0000000000639362 tphysac_ 263 tphysac.F90
ccsm.exe 0000000000572BF4 physpkg_mp_phys_r 849 physpkg.F90
ccsm.exe 000000000048ED1E cam_comp_mp_cam_r 279 cam_comp.F90
ccsm.exe 000000000047D67A atm_comp_mct_mp_a 528 atm_comp_mct.F90
ccsm.exe 000000000040FEBB ccsm_comp_mod_mp_ 2166 ccsm_comp_mod.F90
ccsm.exe 0000000000422F7B MAIN__ 91 ccsm_driver.F90
ccsm.exe 000000000040E296 Unknown Unknown Unknown
libc.so.6 00002B602BBBDCDD Unknown Unknown Unknown
ccsm.exe 000000000040E189 Unknown Unknown Unknown
forrtl: error (69): process interrupted (SIGINT)
+ hundreds of similar linesAfter that I recompiled the model setting DEBUG=TRUE in env_build.xml and INFO_DBUG=3 in env_run.xmlExpectedly, the model crashed again, the resulting message in .../.out pointed again at the coupler log. This coupler log was now full of messages. The content for the timestep where the model crashed gave the following last lines:
comm_diag xxx sorr 35-7.5035924229171823754E+03 send atm Fall_flxdst1
comm_diag xxx sorr 36-4.0277447075338284776E+04 send atm Fall_flxdst2
comm_diag xxx sorr 37-9.4446187010784124141E+04 send atm Fall_flxdst3
comm_diag xxx sorr 38-8.8965352332325681346E+04 send atm Fall_flxdst4Comparing this to other timesteps gives me the feeling that some data receiving should follow now, something like (teken from the timestep before)
comm_diag xxx sorr 1 3.2501741968186116000E+16 recv atm Sa_z
comm_diag xxx sorr 2-4.5541183399918668750E+14 recv atm Sa_u
comm_diag xxx sorr 3 2.7827167220025068750E+14 recv atm Sa_v
...
but this is not happening anymore. Reading the advices to get past WACCM crashes, after that I tried increasing combinations of nspltvrm, nspltrac and nsplit but none of these things helped. In fact, when running the experiment from the beginning instead of restarting the crashes occured even earlier for increased nspltvrm and especially nsplit
The behaviour of the various log-files is always equivalent to my descriptions above.Only in one of my tries I got a core dump file with the crash. Trying to debug this (first time I ever did this) a backtrace gave me the following message (no idea whether this is helpful)
0x000000000040841b at .../ccsm.exe section .text offset 33819 So, does anyone have further hints or ideas how to overcome this crash. I start fearing that WACCM4 is not meant to deal with constant solarmax-forcing of NRLSSI2 :-/Thanks in advance, Tim
I am running CESM1(WACCM) as part of the cesm 1.0.6 suite for a set of time slice simulations. The common basis for all experiments is the F_2000_WACCM.
However, I use a different SST/ICE field as lower boundary forcing (1995-2004 mean annual cycle of HadISST1.1) and (this is the important point) different constant spectral solar irradiance forcings.All my simulations run smooth and stable. All except for one which is forced by solar maximum conditions of Nov 1989 according to the new NRLSSI2 (data made available by Judith Lean via personal communication).Certainly, all the forcing files are prepared correctly. Actually, I just copied the default files used by the F_2000_WACCM compset and exchanged the contained data.For SST/ICE this original file was: .../inputdata/ocn/docn7/SSTDATA/sst_HadOIBl_bc_1.9x2.5_clim_c061031.nc
For spectral solar irradiance: .../inputdata/atm/cam/solar/spectral_irradiance_Lean_1610-2009_ann_c100405.nc
And for F10.7, kp, and ap: .../inputdata/atm/waccm/phot/wa_smax_c100517.nc[here and in the following I shorten the absolute paths to the respective files, hopefully nevertheless indicating which file I mean]
As stated above, I am confident that the forcing files are correct. This is fostered by the fact that a total number of 10 time slice experiments (only SSI forcing different from the one simulation crashing) ran completely fine for 50years.Regarding the simulation which is repeatedly crashing after a maximum of 5.5 years, the .out file always tells me:Model did not complete - see .../cpl.log.XXXXXX-XXXXXXHowever, no crash-related info at all is contained in the coupler log. Only the .../cesm.log.XXXXXX-XXXXXX contained some info for my first try:forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
ccsm.exe 00000000014C1A29 Unknown Unknown Unknown
ccsm.exe 00000000014C03A0 Unknown Unknown Unknown
ccsm.exe 000000000147DC02 Unknown Unknown Unknown
ccsm.exe 000000000140ABA3 Unknown Unknown Unknown
ccsm.exe 000000000141354B Unknown Unknown Unknown
libpthread.so.0 00002B602B990500 Unknown Unknown Unknown
ccsm.exe 00000000008A6C3D gw_drag_mp_gw_dra 2096 gw_drag.F90
ccsm.exe 000000000089BF8F gw_drag_mp_gw_int 809 gw_drag.F90
ccsm.exe 0000000000639362 tphysac_ 263 tphysac.F90
ccsm.exe 0000000000572BF4 physpkg_mp_phys_r 849 physpkg.F90
ccsm.exe 000000000048ED1E cam_comp_mp_cam_r 279 cam_comp.F90
ccsm.exe 000000000047D67A atm_comp_mct_mp_a 528 atm_comp_mct.F90
ccsm.exe 000000000040FEBB ccsm_comp_mod_mp_ 2166 ccsm_comp_mod.F90
ccsm.exe 0000000000422F7B MAIN__ 91 ccsm_driver.F90
ccsm.exe 000000000040E296 Unknown Unknown Unknown
libc.so.6 00002B602BBBDCDD Unknown Unknown Unknown
ccsm.exe 000000000040E189 Unknown Unknown Unknown
forrtl: error (69): process interrupted (SIGINT)
+ hundreds of similar linesAfter that I recompiled the model setting DEBUG=TRUE in env_build.xml and INFO_DBUG=3 in env_run.xmlExpectedly, the model crashed again, the resulting message in .../.out pointed again at the coupler log. This coupler log was now full of messages. The content for the timestep where the model crashed gave the following last lines:
comm_diag xxx sorr 35-7.5035924229171823754E+03 send atm Fall_flxdst1
comm_diag xxx sorr 36-4.0277447075338284776E+04 send atm Fall_flxdst2
comm_diag xxx sorr 37-9.4446187010784124141E+04 send atm Fall_flxdst3
comm_diag xxx sorr 38-8.8965352332325681346E+04 send atm Fall_flxdst4Comparing this to other timesteps gives me the feeling that some data receiving should follow now, something like (teken from the timestep before)
comm_diag xxx sorr 1 3.2501741968186116000E+16 recv atm Sa_z
comm_diag xxx sorr 2-4.5541183399918668750E+14 recv atm Sa_u
comm_diag xxx sorr 3 2.7827167220025068750E+14 recv atm Sa_v
...
but this is not happening anymore. Reading the advices to get past WACCM crashes, after that I tried increasing combinations of nspltvrm, nspltrac and nsplit but none of these things helped. In fact, when running the experiment from the beginning instead of restarting the crashes occured even earlier for increased nspltvrm and especially nsplit
The behaviour of the various log-files is always equivalent to my descriptions above.Only in one of my tries I got a core dump file with the crash. Trying to debug this (first time I ever did this) a backtrace gave me the following message (no idea whether this is helpful)
0x000000000040841b at .../ccsm.exe section .text offset 33819 So, does anyone have further hints or ideas how to overcome this crash. I start fearing that WACCM4 is not meant to deal with constant solarmax-forcing of NRLSSI2 :-/Thanks in advance, Tim