Main menu

Navigation

Advice on getting past WACCM crashes

12 posts / 0 new
Last post
mmills@...
Advice on getting past WACCM crashes

We have been validating the WACCM compsets in the latest release, CESM1.1.1, running on Yellowstone. We found that our runs will occasionally crash, something which we did not find in standard compset runs using the CESM1.0 code base on bluefire and other machines. We have found that decreasing the dynamical timestep, by increasing the namelist variable "nsplit" from 8 to 10 for a period of one month during which the crash occurred, we are able to get past the crash. "nsplit" may then be decreased to its default value of 8 for WACCM, and the run continues.

We will continue to investigate the cause of this increased crash frequency.

Mike Mills, WACCM Liaison Atmospheric Chemistry Division, National Center for Atmospheric Research P.O. Box 3000, Boulder, Colorado 80307-3000 phone: 303.497.1425 fax: 303.497.1400 email: eval(unescape('%64%6f%63%75%6d%65%6e%74%2e%77%72%69%74%65%28%27%3c%61%20%68%72%65%66%3d%22%6d%61%69%6c%74%6f%3a%6d%6d%69%6c%6c%73%40%75%63%61%72%2e%65%64%75%22%20%63%6c%61%73%73%3d%22%62%62%2d%65%6d%61%69%6c%22%3e%6d%6d%69%6c%6c%73%40%75%63%61%72%2e%65%64%75%3c%2f%61%3e%27%29%3b')) [url]http://acd.uc

santos
An update: The most common cause of these crashes in WACCM is a crossing of the Lagrangian levels used in the FV dycore's vertical advection scheme. Therefore, it is best to set the namelist option "nspltvrm" to "2" rather than increasing nsplit. This doubles the frequency of vertical remapping without impacting the rest of the dynamics. "nspltvrm=2" will be the default as of CESM 1.2. In CESM 1.2, a crossing of the Lagrangian levels will also trigger an error message advising the user to increase nspltvrm, rather than simply crashing.

Sean Patrick Santos

CESM Software Engineering Group

andrew.kren@...

Hi Sean,

I just ran into this error after my model ran to year 23. It was almost done but came up with the error in the atmosphere log that Lagrangian levels are crossing and that the Run will abort. It also suggested to increase nspltvrm. I have version cesm 1.2.1, so given your statement above, it should already be at a value of 2, correct? Should I increase it further? And would I need to start the model over from the beginning since I am changing the namelist? How do I set this in the namelist?

kren

santos

It should default to 2, but you can check this by looking in atm_in to see what nspltvrm is set to.

The proper value of this setting can depend on many things, including resolution (especially vertical), the time steps, and or very strong physics forcings that push the model to the edge of stability (e.g. very high stresses or temperature gradients). The setting of 2 seems appropriate for standard WACCM configurations at 2 degrees.

If you change nspltvrm, you do not have to start the run over; it can be adjusted at any time. But if you change it, you may also have to change nsplit, which is the bulk dynamics time step. These two settings control nested loops, so nsplit must be a multiple of nspltvrm. Here are the standard settings for WACCM:

nsplit = 8
nspltvrm = 2

Here's one change that you could attempt, by placing the following line in your user_nl_cam:

nspltvrm = 4

This works because 4 divides 8. If you wanted to increase nspltvrm further, you might have to try something like this:

nsplit = 12
nspltvrm = 6

However, only a few runs have required very significant changes to nsplit, and this generally means that either you have some serious bug, or that you are doing something that's substantially different from any supported way of running the model.

There is one more setting for the FV dycore, which is called "nspltrac", and controls the tracer advection (which can be expensive and is done less frequently than bulk dynamics). We generally allow this to be set automatically by the model, but, if set, it must be a multiple of nspltvrm and nsplit must be a multiple of it. To put it differently, the three variables must satisfy

nsplit/nspltrac = m
nspltrac/nspltvrm = n

where m and n are positive integers (either or both can be 1).

Sean Patrick Santos

CESM Software Engineering Group

andrew.kren@...

Thanks Sean. Right now I am going to try restarting the model from where the previous restart file was and see if it comes up with the error again. If it does, I will try your suggestion. I can't imagine I have a serious bug or that I am doing something that is quite different in terms of running the model. My other run has been fine so far and the only difference I added in this run is a solar cycle by specifying my solar data and parms file. Can this error happen just by itself?

kren

santos

Can this happen by itself? Yes and no.

Yes, in the sense that it can happen intermittently if the model is on the edge of stability. Prior to CESM 1.2, this could show up even in out-of-the-box runs intermittently (on average, maybe every 50-100 years), but this was less common in B1850 runs.

But also no, in the sense that we thought that we had gotten away from "the edge" since setting nspltvrm to 2. That is to say, I don't know of any definitive cases where an out-of-the-box case has encountered this error with nspltvrm set to 2, vs. multiple centuries  of successful runs.

I think it's still a frustratingly open question, this matter of how the WACCM physics affects numerical stability of the dycore. The CAM-SE dycore (HOMME) can produce an equivalent error that has proven much harder to conquer. Also, some CARMA cases have encountered errors that may be the result of large heating rates from the radiation interacting with the dycore.

Sean Patrick Santos

CESM Software Engineering Group

kangwanying1992@...

Dear Mike,

I tried to run a 3xCO2 experiment on WACCM (CESM1.2.2). I changed just the boundary condition and CO2 concentration. But the model crashed in first several steps, and reported

"  20: Run will ABORT!

  20: Suggest to increase NSPLTVRM

  20:(shr_sys_abort) WARNING: calling shr_mpi_abort() and stopping"

 

In this case, I will be more than appreciate if you could give me some suggestion on that, because this error seems very confusing to me... Need I to increase nspltvrm from 2 to 4?

Best regards,

Wanying 

wanying kang

mmills

Wanying,

You can try increasing nsplitvrm. You may also need to change nspltrac and nsplit. nspltrac needs to be a multiple of nspltvrm. nsplit needs to be a multiple of nspltrac.

Mike Mills
WACCM Liaison
Atmospheric Chemistry Division
NCAR Foothills Lab
Boulder, Colorado USA

esjiayu_bin@...

Hello Mike,

    I also met the same error report. Could you please tell me what is the reason or physical explanation  for this error ? 

    Thanks.

mmills

nspltvrm controls the number of vertical re-mapping timesteps per physics timestep.

http://www.cesm.ucar.edu/cgi-bin/eaton/namelist/nldef2html-cam5

If there is an instability in the vertical levels (generally an issue in the upper atmosphere), nspltvrm must be increased to get through the period of instability.

Mike Mills
WACCM Liaison
Atmospheric Chemistry Division
NCAR Foothills Lab
Boulder, Colorado USA

esjiayu_bin@...

Thank you for your reply. And what condition will cause the instability in the vertical levels ? Will extremely hot condition do that?

 Thanks. 

mmills

Yes. High temperatures, such as can occur near the top of WACCM during high auroral activity, often causes the vertical levels to cross. We are testing a new method for avoiding these crashes in WACCM by limiting the value of Bz (the north-south component of the interplanetary magnetic field) in mag_parms.F90. For example:

      real(r8), parameter :: bzmin = -5.0_r8       ! minimum bz

      call solar_parms_get( kp_s = wkp, f107_s = wf107 )

      if( present( by ) ) then

         by  =  0._r8

      end if

      if( present( bz ) ) then

         bz = .433726_r8 - wkp*(.0849999_r8*wkp + .0810363_r8) &

              + wf107*(.00793738_r8 - .00219316_r8*wkp)

         if (bz.lt.bzmin) then

           write(iulog,'(a,f6.2,a,f6.2,a,f6.2,a,f6.1)') 'mag_parms.F90: low bz:',bz,\

                   ' limited to ',bzmin,'; Kp=',wkp,'; F107=',wf107

           bz=bzmin

         end if

      end if

 

Mike Mills
WACCM Liaison
Atmospheric Chemistry Division
NCAR Foothills Lab
Boulder, Colorado USA

Log in or register to post comments

Who's new

  • rlove@...
  • afox
  • shanru@...
  • yongxiao@...
  • terry.mcguinness@...