Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Waccm crash

One of our users is experiencing a consistent crash of waccm after 178 time steps. The crash occurs in tp_core.F90 in the "Regular PPM (Eulerian without FSSL extension)" branch in the statement

fxv(i,j) = mfxv(i,j)*qtmpv(iu,j)

It turns out that iu has been computed as
iu = real(i,r8) - cv(i,j)
and that the Courant-number cv(i,j) is NaN. The conversion to integer results in the value iu=-2**31 which causes an attempted access to non-mapped memory.

The Courant-numbers should not be NaN, but we have no idea why this happens. Does anybody have any ideas?
 

marsh

Member
Diagnosing problems that occur some time into the run are always difficult. It would be helpful to know which model version you are running (3.1.9?), and what architecture are you running on.
 
This is waccm09_cam3_5_07 which we got from the NCAR/ACD group this summer.
We're running on rocks 4.2.1 (centos 4.4 with 2.6.9-55 kernel on dual quad-core barcelona cpus)
It's compiled with pathscale 3.0 and Scali MPI on Infiniband (OFED)
 

marsh

Member
The version of WACCM you are using is an intermediate development version and likely not fully tested. WACCM 3.1.9 is known to be stable over 100s of years of simulations. In attempting to port WACCM to your hardware, a first step is to ensure version 3.1.9 (available from the dataportal) runs successfully.
 
Top