murali@uni_no
New Member
Hi,I am running CAM5.1.1 in the standalone mode at 0.47x0.63 resolution. Number of levels is 30. It runs in the data ocean mode with prescribed, daily SST and ice concentration fields from NOAA. The experiment was setup so that the model runs for 20 years from 1983 to 2002. It runs successfully upto 1998 but crashes there. The crash message in the log file is "_pmiu_daemon(SIGCHLD): PE RANK 2 exit signal Aborted". Just before the crash, there is a message saying that "QNEG3 from TPHYSBCb:m= 5 lat/lchnk= 2466 Min. mixing ratio violated at 1 points. Reset to 0.0E+00 Worst =-3.4E-06 at i,k= 4 1". But I don't think this is a cause for the crash as there are many mixing ratio violations taking place at numerous places before in the log file. There are these warnings as well: "pLCL does not converge and is set to psmin in uwshcu.F90".
The error appears to be at the end of this loop (as indicated by the core dump file):
Here are the lines from the core dump file:-------------------------------------------------------------------------------------------------------------------------------------------------------------(gdb) bt
#0 0x0000000001414dab in raise (sig=) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:41
#1 0x00000000014e4041 in abort () at abort.c:92
#2 0x000000000108a4a2 in MPID_Abort ()
#3 0x000000000106abcc in PMPI_Abort ()
#4 0x000000000103e7cd in pmpi_abort__ ()
#5 0x00000000004b3bef in abortutils::endrun (msg='') at /home/bjerknes/mad042/cesm1_0_4-cam-standalone/models/atm/cam/src/utils/abortutils.F90:36
#6 0x000000000058ba11 in cldwat2m_macro::gaussj (a=..., n=2, np=0, b=..., m=Cannot access memory at address 0x0
)
at /home/bjerknes/mad042/cesm1_0_4-cam-standalone/models/atm/cam/src/physics/cam/cldwat2m_macro.F90:3535
#7 0x0000000000584ef2 in cldwat2m_macro::mmacro_pcond (lchnk=1053, ncol=-29584, dt=6.9533558063733605e-310, p=..., dp=..., t0=..., qv0=..., ql0=..., qi0=..., nl0=...,
ni0=..., a_t=..., a_qv=..., a_ql=..., a_qi=..., a_nl=..., a_ni=..., c_t=..., c_qv=..., c_ql=..., c_qi=..., c_nl=..., c_ni=..., c_qlst=..., d_t=..., d_qv=..., d_ql=...,
d_qi=..., d_nl=..., d_ni=..., a_cud=..., a_cu0=..., landfrac=..., snowh=..., s_tendout=..., qv_tendout=..., ql_tendout=..., qi_tendout=..., nl_tendout=...,
ni_tendout=..., qme=..., qvadj=..., qladj=..., qiadj=..., qllim=..., qilim=..., cld=..., al_st_star=..., ai_st_star=..., ql_st_star=..., qi_st_star=...)
at /home/bjerknes/mad042/cesm1_0_4-cam-standalone/models/atm/cam/src/physics/cam/cldwat2m_macro.F90:773
#8 0x00000000008c70ed in macrop_driver::macrop_driver_tend (state=..., ptend_all=..., dtime=1800, landfrac=..., ocnfrac=..., snowh=..., dlf=..., dlf2=..., cmfmc=...,
cmfmc2=..., ts=..., sst=..., zdu=..., pbuf=Cannot access memory at address 0x7fffffff8c70
) at /home/bjerknes/mad042/cesm1_0_4-cam-standalone/models/atm/cam/src/physics/cam/macrop_driver.F90:788
#9 0x0000000000fccd4b in tphysbc (ztodt=1800, pblht=..., tpert=..., qpert=..., fsns=..., fsnt=..., flns=..., flnt=..., state=Cannot access memory at address 0x7fffffff8d70
)
at /home/bjerknes/mad042/cesm1_0_4-cam-standalone/models/atm/cam/src/physics/cam/tphysbc.F90:382
#10 0x0000000000a72a54 in physpkg::phys_run1 (phys_state=Cannot access memory at address 0x7fffffff8da0
) at /home/bjerknes/mad042/cesm1_0_4-cam-standalone/models/atm/cam/src/physics/cam/physpkg.F90:665
#11 0x000000000050b65a in cam_comp::cam_run1 (cam_in=Asked for position 0 of stack, stack only has 0 elements on it.
) at /home/bjerknes/mad042/cesm1_0_4-cam-standalone/models/atm/cam/src/control/cam_comp.F90:218
#12 0x00000000004dfaf9 in atm_comp_mct::atm_run_mct (eclock=..., cdata_a=..., x2a_a=..., a2x_a=...)
at /home/bjerknes/mad042/cesm1_0_4-cam-standalone/models/atm/cam/src/cpl_mct/atm_comp_mct.F90:523
#13 0x0000000000565eec in ccsm_comp_mod::ccsm_run () at /home/bjerknes/mad042/cesm1_0_4-cam-standalone/models/drv/driver/ccsm_comp_mod.F90:2165
#14 0x0000000000569959 in ccsm_driver () at /home/bjerknes/mad042/cesm1_0_4-cam-standalone/models/drv/driver/ccsm_driver.F90:47
#15 0x00000000004008d0 in main ()
#16 0x00000000014dea54 in __libc_start_main (main=0x400890 , argc=1, ubp_av=0x7fffffffa198, init=0x3, fini=0xfcaad68, rtld_fini=0, stack_end=0x7fffffffa188)
at libc-start.c:226
#17 0x00000000004007a5 in _start () at ../sysdeps/x86_64/elf/start.S:113
--------------------------------------------------------------------------------------------------------------------------------------------------------------------Could anyone comment on this issue?Thanks !
The error appears to be at the end of this loop (as indicated by the core dump file):
Code:
<span style="color: #ff0000;"> if (a(icol,icol).eq.0.) then
write(iulog,*) 'singular matrix in gaussj 2'
do ii = 1, np
do jj = 1, np
write(iulog,*) ii, jj, aa(ii,jj), bb(ii,1)
end do
end do
call endrun
(gdb) p a(icol, icol)
= 0
(gdb) p ii
= -35040
(gdb) p jj
= 0
(gdb) p aa(ii,jj)
no such vector element
(gdb) p bb(ii,1)
no such vector element
(gdb)</span> <br /><br />
#0 0x0000000001414dab in raise (sig=) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:41
#1 0x00000000014e4041 in abort () at abort.c:92
#2 0x000000000108a4a2 in MPID_Abort ()
#3 0x000000000106abcc in PMPI_Abort ()
#4 0x000000000103e7cd in pmpi_abort__ ()
#5 0x00000000004b3bef in abortutils::endrun (msg='') at /home/bjerknes/mad042/cesm1_0_4-cam-standalone/models/atm/cam/src/utils/abortutils.F90:36
#6 0x000000000058ba11 in cldwat2m_macro::gaussj (a=..., n=2, np=0, b=..., m=Cannot access memory at address 0x0
)
at /home/bjerknes/mad042/cesm1_0_4-cam-standalone/models/atm/cam/src/physics/cam/cldwat2m_macro.F90:3535
#7 0x0000000000584ef2 in cldwat2m_macro::mmacro_pcond (lchnk=1053, ncol=-29584, dt=6.9533558063733605e-310, p=..., dp=..., t0=..., qv0=..., ql0=..., qi0=..., nl0=...,
ni0=..., a_t=..., a_qv=..., a_ql=..., a_qi=..., a_nl=..., a_ni=..., c_t=..., c_qv=..., c_ql=..., c_qi=..., c_nl=..., c_ni=..., c_qlst=..., d_t=..., d_qv=..., d_ql=...,
d_qi=..., d_nl=..., d_ni=..., a_cud=..., a_cu0=..., landfrac=..., snowh=..., s_tendout=..., qv_tendout=..., ql_tendout=..., qi_tendout=..., nl_tendout=...,
ni_tendout=..., qme=..., qvadj=..., qladj=..., qiadj=..., qllim=..., qilim=..., cld=..., al_st_star=..., ai_st_star=..., ql_st_star=..., qi_st_star=...)
at /home/bjerknes/mad042/cesm1_0_4-cam-standalone/models/atm/cam/src/physics/cam/cldwat2m_macro.F90:773
#8 0x00000000008c70ed in macrop_driver::macrop_driver_tend (state=..., ptend_all=..., dtime=1800, landfrac=..., ocnfrac=..., snowh=..., dlf=..., dlf2=..., cmfmc=...,
cmfmc2=..., ts=..., sst=..., zdu=..., pbuf=Cannot access memory at address 0x7fffffff8c70
) at /home/bjerknes/mad042/cesm1_0_4-cam-standalone/models/atm/cam/src/physics/cam/macrop_driver.F90:788
#9 0x0000000000fccd4b in tphysbc (ztodt=1800, pblht=..., tpert=..., qpert=..., fsns=..., fsnt=..., flns=..., flnt=..., state=Cannot access memory at address 0x7fffffff8d70
)
at /home/bjerknes/mad042/cesm1_0_4-cam-standalone/models/atm/cam/src/physics/cam/tphysbc.F90:382
#10 0x0000000000a72a54 in physpkg::phys_run1 (phys_state=Cannot access memory at address 0x7fffffff8da0
) at /home/bjerknes/mad042/cesm1_0_4-cam-standalone/models/atm/cam/src/physics/cam/physpkg.F90:665
#11 0x000000000050b65a in cam_comp::cam_run1 (cam_in=Asked for position 0 of stack, stack only has 0 elements on it.
) at /home/bjerknes/mad042/cesm1_0_4-cam-standalone/models/atm/cam/src/control/cam_comp.F90:218
#12 0x00000000004dfaf9 in atm_comp_mct::atm_run_mct (eclock=..., cdata_a=..., x2a_a=..., a2x_a=...)
at /home/bjerknes/mad042/cesm1_0_4-cam-standalone/models/atm/cam/src/cpl_mct/atm_comp_mct.F90:523
#13 0x0000000000565eec in ccsm_comp_mod::ccsm_run () at /home/bjerknes/mad042/cesm1_0_4-cam-standalone/models/drv/driver/ccsm_comp_mod.F90:2165
#14 0x0000000000569959 in ccsm_driver () at /home/bjerknes/mad042/cesm1_0_4-cam-standalone/models/drv/driver/ccsm_driver.F90:47
#15 0x00000000004008d0 in main ()
#16 0x00000000014dea54 in __libc_start_main (main=0x400890 , argc=1, ubp_av=0x7fffffffa198, init=0x3, fini=0xfcaad68, rtld_fini=0, stack_end=0x7fffffffa188)
at libc-start.c:226
#17 0x00000000004007a5 in _start () at ../sysdeps/x86_64/elf/start.S:113
--------------------------------------------------------------------------------------------------------------------------------------------------------------------Could anyone comment on this issue?Thanks !
Code:
<br /><br />