Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Problem running WACCM on PowerPC 970MP at MareNostrum (Barcelona)

Hi everyone,

I'm experiencing a problem when running WACCM-v3.1.9 at the MareNostrum cluster.

The model code worked fine up to a few months ago, it compiled and run over a few model days without any problems with the older compiler xlf-v10, but after a change made 1 month ago by the admins on the MareNostrum architecture (e.g. the compiler was updated to v12), WACCM stopped working.

The code still compiles OK, but I get a segmentation fault shortly after the initialization procedure of the land-model, but still during the forward integration (stepon.F90). It does not even complete one time-step (same error with nelapse=1).

The tail of the large log file reads as follows:

solar_parms_timestep_init: values for date = 19650101
--------------------------------------------------------
euvac_set_etf: f107,f107a = 77.1500015258789062 77.1500015258789062
--------------------------------------------------------
ADDITIONAL_CONSTITUENTS: RE-INITIALIZING HORZ/VERT CONSTITUENTS
HERE TPHYSBC before moist convection routine
HERE CLOUD WATER TRANSPORT

srun: error: s35c4b12: task0: Segmentation fault
srun: Terminating job


After some coarse debugging (by inserting comments in the code), I found out that the segmentation fault is produced in the model physics (physpkg.F90 -- tphysbc.F90)--> moist convection (convect_deep.F90), when the subroutine convtran, which belongs to the module zm_conv.F90, is called for convective tracer transport.

IN TPHYSBC.F90 on L249:

write(*,*) ' HERE TPHYSBC before moist convection routine'

call convect_deep_tend( prec_zmc, &
pblht, cmfmc, cmfcme, &
tpert, dlf, pflx, zdu, &
rliq, &
ztodt, snow_zmc, &
state, ptend, pbuf )

write(*,*) ' HERE TPHYSBC after moist convection routine'

IN CONVECT_DEEP.F90 on L401:

ptend_loc%lq(ixcldliq) = .true.
write(6,*) 'HERE CLOUD WATER TRANSPORT'

call convtran (lchnk, &
ptend_loc%lq(1),state1%q(1,1,1), ppcnst, mu(1,1,lchnk), md(1,1,lchnk), &
du(1,1,lchnk), eu(1,1,lchnk), ed(1,1,lchnk), dp(1,1,lchnk), dsubcld(1,lchnk), &
jt(1,lchnk),maxg(1,lchnk), ideep(1,lchnk), 1, lengath(lchnk), &
nstep, fracis, ptend_loc%q(1,1,1) )
call t_stopf ('convtran1')

write(6,*) 'HERE CLOUD WATER TRANSPORT FINISH'



The segmentation fault must be related to some variable aliasing in the convtran-calling, since no procedure is executed in this subroutine. However, I checked the dimensions & type of the input variables and couldn't find any errors.

I also printed all input variables with the following results, but found no evidence of any missing values

write(6,*) 'HERE CLOUD WATER TRANSPORT'
write(6,*) lchnk, ptend_loc%lq(1), state1%q(1,1,1), ppcnst, mu(1,1,lchnk), md(1,1,lchnk)
write(6,*) du(1,1,lchnk), eu(1,1,lchnk), ed(1,1,lchnk), dp(1,1,lchnk), dsubcld(1,lchnk)
write(6,*) jt(1,lchnk),maxg(1,lchnk), ideep(1,lchnk), 1, lengath(lchnk)

--> with the following results:

97 F 0.833287327955798996E-08 63 0.000000000000000000E+00 0.000000000000000000E+00
0.000000000000000000E+00 0.000000000000000000E+00 0.000000000000000000E+00 0.291959999999999954E-05 15.0457486648570917 63 66 8 1 4

Did any of you experience similar problems with WACCM-v3-1-9 & have any advice how to fix this problem?

I tried lots and lots of things, i.e. optimized compilation flags & switched the compiler to the older version & tried to run WACCM in pure MPI or even with 1 proc, but it does not work.

Kind regards
 
I tried again to switch the new fortran compiler to the older version and source some scripts with the compiler definitions and it worked; the model is running again.

Apparently, WACCM does not like the last xlf90 version 12.

Is anybody using this fortran compiler version?

If so, could you please tell me which compiler options (flags in the Makefile) you use?
 
Hi, we are also running v12.1.

prompt# xlf -qversion
IBM XL Fortran for AIX, V12.1
Version: 12.01.0000.0002

We have also experienced runtime errors with WACCM v3.1.9. We did get the model to run using the -chem waccm_ghg configure option, but never using the standard -chem waccm_mozart option.

I asked the sys admin if they could install xlf v11 for us to try, but they could not.

Does anyone know (a) if xlf v12 is a supported compiler for WACCM 3.1.9, and (b) if more recent WACCM versions (v3.5 or 3.6) have worked with xlf v12?
 

fvitt

CSEG and Liaisons
Staff member
WACCM v3.1.9 compiled with xlf version 12 does run at NCAR. The later development version 3.6 also runs. We have experienced some problems with the -O3 optimization flag. I suggest trying the -O2 optimization.

IBM XL Fortran for AIX, V12.1
Version: 12.01.0000.0003
 
Top