Welcome to the new DiscussCESM forum!
We are still working on the website migration, so you may experience downtime during this process.

Existing users, please reset your password before logging in here: https://xenforo.cgd.ucar.edu/cesm/index.php?lost-password/

problems with running 0.47x0.63 CESM

Hi CESM community:
I am trying to run high-resolution simulations (0.47x0.63, f05_g16) and am having trouble running CESM1.2 F compset. In debug mode, the error message is:
MPT ERROR: Rank 132(g:132) received signal SIGFPE(8)
according to my limited knowledge this could be related to writing some bad numbers into the output file. However, I did not modify the .F90 files as shown in the log file:
--

#1 0x00002b7160094db6 in mpi_sgi_system (
80:MPT: #2 MPI_SGI_stacktraceback (
80:MPT: header=header@entry=0x7fff84208a40 "MPT ERROR: Rank 80(g:80) received signal SIGFPE(8).\n\tProcess ID: 13560, Host: r4i6n4, Program: /glade/scratch/dleung/CESM/trial_res_047x063/cesm.exe\n\tMPT Version: HPE MPT 2.19 02/23/19 05:30:09\n") at sig.c:340
80:MPT: #3 0x00002b7160094fb2 in first_arriver_handler (signo=signo@entry=8,
80:MPT: stack_trace_sem=stack_trace_sem@entry=0x2b716a680080) at sig.c:489
80:MPT: #4 0x00002b716009534b in slave_sig_handler (signo=8, siginfo=<optimized out>,
80:MPT: extra=<optimized out>) at sig.c:564
80:MPT: #5 <signal handler called>
80:MPT: #6 0x00000000039428da in subgridavemod::p2g_1d (lbp=325327, ubp=329278,
80:MPT: lbc=50783, ubc=51262, lbl=32227, ubl=32595, lbg=25029, ubg=25340,
80:MPT: parr=..., garr=..., p2c_scale_type=..., c2l_scale_type=...,
80:MPT: l2g_scale_type=..., .tmp.P2C_SCALE_TYPE.len_V$1118=8,
80:MPT: .tmp.C2L_SCALE_TYPE.len_V$111b=8, .tmp.L2G_SCALE_TYPE.len_V$111e=8)
80:MPT: at /gpfs/fs1/work/dleung/cesm1_2_2_1_diameter_roughness_clayfrc_LULC/models/lnd/clm/src/clm4_0/main/subgridAveMod.F90:762
80:MPT: #7 0x000000000362a05b in histfilemod::hist_update_hbuf_field_1d (t=1, f=119,
80:MPT: begp=325327, endp=329278, begc=50783, endc=51262, begl=32227, endl=32595,
80:MPT: begg=25029, endg=25340)
80:MPT: at /gpfs/fs1/work/dleung/cesm1_2_2_1_diameter_roughness_clayfrc_LULC/models/lnd/clm/src/clm4_0/main/histFileMod.F90:1150
80:MPT: #8 0x0000000003626507 in histfilemod::hist_update_hbuf ()
80:MPT: at /gpfs/fs1/work/dleung/cesm1_2_2_1_diameter_roughness_clayfrc_LULC/models/lnd/clm/src/clm4_0/main/histFileMod.F90:1063
80:MPT: #9 0x0000000003471604 in clm_driver::clm_drv (doalb=.FALSE.,
80:MPT: nextsw_cday=1.0625, declinp1=-0.40294823456129064,
80:MPT: declin=-0.4030289369547867, rstwr=.FALSE., nlend=.FALSE., rdate=...,
80:MPT: .tmp.RDATE.len_V$2ef8=32)

--
I did not modify subgridAveMod.F90 and histFileMod.F90. I think the error in subgridAveMod.F90 line 762 was related to a line averaging pft-level quantities parr(p) to grid-level quantities garr(g). I am not familiar with this code, and I guess parr(p) is a general dummy which gets any pft-level variables aggregated to grid-level. All of my codes could run for many years in 1.9x2.5 (f19_g16) and in 0.9x1.25 (f09_g16), but errors occurred when running 0.47x0.63, and I have no idea how to fix it.
I attached the cesm and lnd log files. Any help or comment would be greatly appreciated. I could provide further information if helpful.
Thank you,
Danny Leung

Some paths if helpful:
My case directory: /glade/scratch/dleung/CESM/trial_res_047x063

The cesm log file (in debug mode): /glade/scratch/dleung/CESM/trial_res_047x063/run/cesm.log.211117-140041
Some modified codes that work in coarse resolutions: /glade/scratch/dleung/CESM/trial_res_047x063/SourceMods/src.clm
My source code directory: /gpfs/fs1/work/dleung/cesm1_2_2_1_diameter_roughness_clayfrc_LULC/
 

Attachments

  • cesm.log.211117-140041.txt
    476.3 KB · Views: 1
  • lnd.log.211117-140041.txt
    90.9 KB · Views: 1
Top