dannymleung@ucla_edu
New Member
Hi CESM community:
I am trying to run high-resolution simulations (0.47x0.63, f05_g16) and am having trouble running CESM1.2 F compset. In debug mode, the error message is:
MPT ERROR: Rank 132(g:132) received signal SIGFPE(8)
according to my limited knowledge this could be related to writing some bad numbers into the output file. However, I did not modify the .F90 files as shown in the log file:
--
#1 0x00002b7160094db6 in mpi_sgi_system (
80:MPT: #2 MPI_SGI_stacktraceback (
80:MPT: header=header@entry=0x7fff84208a40 "MPT ERROR: Rank 80(g:80) received signal SIGFPE(8).\n\tProcess ID: 13560, Host: r4i6n4, Program: /glade/scratch/dleung/CESM/trial_res_047x063/cesm.exe\n\tMPT Version: HPE MPT 2.19 02/23/19 05:30:09\n") at sig.c:340
80:MPT: #3 0x00002b7160094fb2 in first_arriver_handler (signo=signo@entry=8,
80:MPT: stack_trace_sem=stack_trace_sem@entry=0x2b716a680080) at sig.c:489
80:MPT: #4 0x00002b716009534b in slave_sig_handler (signo=8, siginfo=<optimized out>,
80:MPT: extra=<optimized out>) at sig.c:564
80:MPT: #5 <signal handler called>
80:MPT: #6 0x00000000039428da in subgridavemod::p2g_1d (lbp=325327, ubp=329278,
80:MPT: lbc=50783, ubc=51262, lbl=32227, ubl=32595, lbg=25029, ubg=25340,
80:MPT: parr=..., garr=..., p2c_scale_type=..., c2l_scale_type=...,
80:MPT: l2g_scale_type=..., .tmp.P2C_SCALE_TYPE.len_V$1118=8,
80:MPT: .tmp.C2L_SCALE_TYPE.len_V$111b=8, .tmp.L2G_SCALE_TYPE.len_V$111e=8)
80:MPT: at /gpfs/fs1/work/dleung/cesm1_2_2_1_diameter_roughness_clayfrc_LULC/models/lnd/clm/src/clm4_0/main/subgridAveMod.F90:762
80:MPT: #7 0x000000000362a05b in histfilemod::hist_update_hbuf_field_1d (t=1, f=119,
80:MPT: begp=325327, endp=329278, begc=50783, endc=51262, begl=32227, endl=32595,
80:MPT: begg=25029, endg=25340)
80:MPT: at /gpfs/fs1/work/dleung/cesm1_2_2_1_diameter_roughness_clayfrc_LULC/models/lnd/clm/src/clm4_0/main/histFileMod.F90:1150
80:MPT: #8 0x0000000003626507 in histfilemod::hist_update_hbuf ()
80:MPT: at /gpfs/fs1/work/dleung/cesm1_2_2_1_diameter_roughness_clayfrc_LULC/models/lnd/clm/src/clm4_0/main/histFileMod.F90:1063
80:MPT: #9 0x0000000003471604 in clm_driver::clm_drv (doalb=.FALSE.,
80:MPT: nextsw_cday=1.0625, declinp1=-0.40294823456129064,
80:MPT: declin=-0.4030289369547867, rstwr=.FALSE., nlend=.FALSE., rdate=...,
80:MPT: .tmp.RDATE.len_V$2ef8=32)
--
I did not modify subgridAveMod.F90 and histFileMod.F90. I think the error in subgridAveMod.F90 line 762 was related to a line averaging pft-level quantities parr(p) to grid-level quantities garr(g). I am not familiar with this code, and I guess parr(p) is a general dummy which gets any pft-level variables aggregated to grid-level. All of my codes could run for many years in 1.9x2.5 (f19_g16) and in 0.9x1.25 (f09_g16), but errors occurred when running 0.47x0.63, and I have no idea how to fix it.
I attached the cesm and lnd log files. Any help or comment would be greatly appreciated. I could provide further information if helpful.
Thank you,
Danny Leung
Some paths if helpful:
My case directory: /glade/scratch/dleung/CESM/trial_res_047x063
The cesm log file (in debug mode): /glade/scratch/dleung/CESM/trial_res_047x063/run/cesm.log.211117-140041
Some modified codes that work in coarse resolutions: /glade/scratch/dleung/CESM/trial_res_047x063/SourceMods/src.clm
My source code directory: /gpfs/fs1/work/dleung/cesm1_2_2_1_diameter_roughness_clayfrc_LULC/
I am trying to run high-resolution simulations (0.47x0.63, f05_g16) and am having trouble running CESM1.2 F compset. In debug mode, the error message is:
MPT ERROR: Rank 132(g:132) received signal SIGFPE(8)
according to my limited knowledge this could be related to writing some bad numbers into the output file. However, I did not modify the .F90 files as shown in the log file:
--
#1 0x00002b7160094db6 in mpi_sgi_system (
80:MPT: #2 MPI_SGI_stacktraceback (
80:MPT: header=header@entry=0x7fff84208a40 "MPT ERROR: Rank 80(g:80) received signal SIGFPE(8).\n\tProcess ID: 13560, Host: r4i6n4, Program: /glade/scratch/dleung/CESM/trial_res_047x063/cesm.exe\n\tMPT Version: HPE MPT 2.19 02/23/19 05:30:09\n") at sig.c:340
80:MPT: #3 0x00002b7160094fb2 in first_arriver_handler (signo=signo@entry=8,
80:MPT: stack_trace_sem=stack_trace_sem@entry=0x2b716a680080) at sig.c:489
80:MPT: #4 0x00002b716009534b in slave_sig_handler (signo=8, siginfo=<optimized out>,
80:MPT: extra=<optimized out>) at sig.c:564
80:MPT: #5 <signal handler called>
80:MPT: #6 0x00000000039428da in subgridavemod::p2g_1d (lbp=325327, ubp=329278,
80:MPT: lbc=50783, ubc=51262, lbl=32227, ubl=32595, lbg=25029, ubg=25340,
80:MPT: parr=..., garr=..., p2c_scale_type=..., c2l_scale_type=...,
80:MPT: l2g_scale_type=..., .tmp.P2C_SCALE_TYPE.len_V$1118=8,
80:MPT: .tmp.C2L_SCALE_TYPE.len_V$111b=8, .tmp.L2G_SCALE_TYPE.len_V$111e=8)
80:MPT: at /gpfs/fs1/work/dleung/cesm1_2_2_1_diameter_roughness_clayfrc_LULC/models/lnd/clm/src/clm4_0/main/subgridAveMod.F90:762
80:MPT: #7 0x000000000362a05b in histfilemod::hist_update_hbuf_field_1d (t=1, f=119,
80:MPT: begp=325327, endp=329278, begc=50783, endc=51262, begl=32227, endl=32595,
80:MPT: begg=25029, endg=25340)
80:MPT: at /gpfs/fs1/work/dleung/cesm1_2_2_1_diameter_roughness_clayfrc_LULC/models/lnd/clm/src/clm4_0/main/histFileMod.F90:1150
80:MPT: #8 0x0000000003626507 in histfilemod::hist_update_hbuf ()
80:MPT: at /gpfs/fs1/work/dleung/cesm1_2_2_1_diameter_roughness_clayfrc_LULC/models/lnd/clm/src/clm4_0/main/histFileMod.F90:1063
80:MPT: #9 0x0000000003471604 in clm_driver::clm_drv (doalb=.FALSE.,
80:MPT: nextsw_cday=1.0625, declinp1=-0.40294823456129064,
80:MPT: declin=-0.4030289369547867, rstwr=.FALSE., nlend=.FALSE., rdate=...,
80:MPT: .tmp.RDATE.len_V$2ef8=32)
--
I did not modify subgridAveMod.F90 and histFileMod.F90. I think the error in subgridAveMod.F90 line 762 was related to a line averaging pft-level quantities parr(p) to grid-level quantities garr(g). I am not familiar with this code, and I guess parr(p) is a general dummy which gets any pft-level variables aggregated to grid-level. All of my codes could run for many years in 1.9x2.5 (f19_g16) and in 0.9x1.25 (f09_g16), but errors occurred when running 0.47x0.63, and I have no idea how to fix it.
I attached the cesm and lnd log files. Any help or comment would be greatly appreciated. I could provide further information if helpful.
Thank you,
Danny Leung
Some paths if helpful:
My case directory: /glade/scratch/dleung/CESM/trial_res_047x063
The cesm log file (in debug mode): /glade/scratch/dleung/CESM/trial_res_047x063/run/cesm.log.211117-140041
Some modified codes that work in coarse resolutions: /glade/scratch/dleung/CESM/trial_res_047x063/SourceMods/src.clm
My source code directory: /gpfs/fs1/work/dleung/cesm1_2_2_1_diameter_roughness_clayfrc_LULC/