gchiodo@fis_ucm_es
Member
Hi everyone,
I'm experiencing a problem when running WACCM-v3.1.9 at the MareNostrum cluster.
The model code worked fine up to a few months ago, it compiled and run over a few model days without any problems with the older compiler xlf-v10, but after a change made 1 month ago by the admins on the MareNostrum architecture (e.g. the compiler was updated to v12), WACCM stopped working.
The code still compiles OK, but I get a segmentation fault shortly after the initialization procedure of the land-model, but still during the forward integration (stepon.F90). It does not even complete one time-step (same error with nelapse=1).
The tail of the large log file reads as follows:
solar_parms_timestep_init: values for date = 19650101
--------------------------------------------------------
euvac_set_etf: f107,f107a = 77.1500015258789062 77.1500015258789062
--------------------------------------------------------
ADDITIONAL_CONSTITUENTS: RE-INITIALIZING HORZ/VERT CONSTITUENTS
HERE TPHYSBC before moist convection routine
HERE CLOUD WATER TRANSPORT
srun: error: s35c4b12: task0: Segmentation fault
srun: Terminating job
After some coarse debugging (by inserting comments in the code), I found out that the segmentation fault is produced in the model physics (physpkg.F90 -- tphysbc.F90)--> moist convection (convect_deep.F90), when the subroutine convtran, which belongs to the module zm_conv.F90, is called for convective tracer transport.
IN TPHYSBC.F90 on L249:
write(*,*) ' HERE TPHYSBC before moist convection routine'
call convect_deep_tend( prec_zmc, &
pblht, cmfmc, cmfcme, &
tpert, dlf, pflx, zdu, &
rliq, &
ztodt, snow_zmc, &
state, ptend, pbuf )
write(*,*) ' HERE TPHYSBC after moist convection routine'
IN CONVECT_DEEP.F90 on L401:
ptend_loc%lq(ixcldliq) = .true.
write(6,*) 'HERE CLOUD WATER TRANSPORT'
call convtran (lchnk, &
ptend_loc%lq(1),state1%q(1,1,1), ppcnst, mu(1,1,lchnk), md(1,1,lchnk), &
du(1,1,lchnk), eu(1,1,lchnk), ed(1,1,lchnk), dp(1,1,lchnk), dsubcld(1,lchnk), &
jt(1,lchnk),maxg(1,lchnk), ideep(1,lchnk), 1, lengath(lchnk), &
nstep, fracis, ptend_loc%q(1,1,1) )
call t_stopf ('convtran1')
write(6,*) 'HERE CLOUD WATER TRANSPORT FINISH'
The segmentation fault must be related to some variable aliasing in the convtran-calling, since no procedure is executed in this subroutine. However, I checked the dimensions & type of the input variables and couldn't find any errors.
I also printed all input variables with the following results, but found no evidence of any missing values
write(6,*) 'HERE CLOUD WATER TRANSPORT'
write(6,*) lchnk, ptend_loc%lq(1), state1%q(1,1,1), ppcnst, mu(1,1,lchnk), md(1,1,lchnk)
write(6,*) du(1,1,lchnk), eu(1,1,lchnk), ed(1,1,lchnk), dp(1,1,lchnk), dsubcld(1,lchnk)
write(6,*) jt(1,lchnk),maxg(1,lchnk), ideep(1,lchnk), 1, lengath(lchnk)
--> with the following results:
97 F 0.833287327955798996E-08 63 0.000000000000000000E+00 0.000000000000000000E+00
0.000000000000000000E+00 0.000000000000000000E+00 0.000000000000000000E+00 0.291959999999999954E-05 15.0457486648570917 63 66 8 1 4
Did any of you experience similar problems with WACCM-v3-1-9 & have any advice how to fix this problem?
I tried lots and lots of things, i.e. optimized compilation flags & switched the compiler to the older version & tried to run WACCM in pure MPI or even with 1 proc, but it does not work.
Kind regards
I'm experiencing a problem when running WACCM-v3.1.9 at the MareNostrum cluster.
The model code worked fine up to a few months ago, it compiled and run over a few model days without any problems with the older compiler xlf-v10, but after a change made 1 month ago by the admins on the MareNostrum architecture (e.g. the compiler was updated to v12), WACCM stopped working.
The code still compiles OK, but I get a segmentation fault shortly after the initialization procedure of the land-model, but still during the forward integration (stepon.F90). It does not even complete one time-step (same error with nelapse=1).
The tail of the large log file reads as follows:
solar_parms_timestep_init: values for date = 19650101
--------------------------------------------------------
euvac_set_etf: f107,f107a = 77.1500015258789062 77.1500015258789062
--------------------------------------------------------
ADDITIONAL_CONSTITUENTS: RE-INITIALIZING HORZ/VERT CONSTITUENTS
HERE TPHYSBC before moist convection routine
HERE CLOUD WATER TRANSPORT
srun: error: s35c4b12: task0: Segmentation fault
srun: Terminating job
After some coarse debugging (by inserting comments in the code), I found out that the segmentation fault is produced in the model physics (physpkg.F90 -- tphysbc.F90)--> moist convection (convect_deep.F90), when the subroutine convtran, which belongs to the module zm_conv.F90, is called for convective tracer transport.
IN TPHYSBC.F90 on L249:
write(*,*) ' HERE TPHYSBC before moist convection routine'
call convect_deep_tend( prec_zmc, &
pblht, cmfmc, cmfcme, &
tpert, dlf, pflx, zdu, &
rliq, &
ztodt, snow_zmc, &
state, ptend, pbuf )
write(*,*) ' HERE TPHYSBC after moist convection routine'
IN CONVECT_DEEP.F90 on L401:
ptend_loc%lq(ixcldliq) = .true.
write(6,*) 'HERE CLOUD WATER TRANSPORT'
call convtran (lchnk, &
ptend_loc%lq(1),state1%q(1,1,1), ppcnst, mu(1,1,lchnk), md(1,1,lchnk), &
du(1,1,lchnk), eu(1,1,lchnk), ed(1,1,lchnk), dp(1,1,lchnk), dsubcld(1,lchnk), &
jt(1,lchnk),maxg(1,lchnk), ideep(1,lchnk), 1, lengath(lchnk), &
nstep, fracis, ptend_loc%q(1,1,1) )
call t_stopf ('convtran1')
write(6,*) 'HERE CLOUD WATER TRANSPORT FINISH'
The segmentation fault must be related to some variable aliasing in the convtran-calling, since no procedure is executed in this subroutine. However, I checked the dimensions & type of the input variables and couldn't find any errors.
I also printed all input variables with the following results, but found no evidence of any missing values
write(6,*) 'HERE CLOUD WATER TRANSPORT'
write(6,*) lchnk, ptend_loc%lq(1), state1%q(1,1,1), ppcnst, mu(1,1,lchnk), md(1,1,lchnk)
write(6,*) du(1,1,lchnk), eu(1,1,lchnk), ed(1,1,lchnk), dp(1,1,lchnk), dsubcld(1,lchnk)
write(6,*) jt(1,lchnk),maxg(1,lchnk), ideep(1,lchnk), 1, lengath(lchnk)
--> with the following results:
97 F 0.833287327955798996E-08 63 0.000000000000000000E+00 0.000000000000000000E+00
0.000000000000000000E+00 0.000000000000000000E+00 0.000000000000000000E+00 0.291959999999999954E-05 15.0457486648570917 63 66 8 1 4
Did any of you experience similar problems with WACCM-v3-1-9 & have any advice how to fix this problem?
I tried lots and lots of things, i.e. optimized compilation flags & switched the compiler to the older version & tried to run WACCM in pure MPI or even with 1 proc, but it does not work.
Kind regards