Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

CESM1-1-2-LENS-n21: QNEG4 WARNING from TPHYSAC

Hi,

I am running CESM-1-1-2-LENS model tag 21 on Cheyenne with compset B1850LENS, res f09_g16, and only thing I change is adding a CO2 forcing (2xPI). I get the following error in cesm.log after model runs for 14 years and 9 months:

QNEG4 WARNING from TPHYSAC Max possible LH flx exceeded at 1 points. , Worst excess = -1.1977E-02, lchnk = ***, i = 9, same as indices lat = 65, lon = 111
411: imp_sol: Time step 1.8000000000000E+03 failed to converge @ (lchnk,lev,col,nstep) = 3189 24 9241112

Interestingly, I ran the same model with CO2 values at 3xPI, and it ran for 150 years with no problem.

I tried the following things that didn't help:
-- setting stacksize to unlimited
-- setting fv_nsplit, fv_nspltrac, fv_nspltvrm to multiples of 2 (from 2 to 512) and inithist = 'DAILY' in user_nl_cam

Any suggestions?

Ivan Mitevski
 

nusbaume

Jesse Nusbaumer
CSEG and Liaisons
Staff member
Hi Ivan,

Is the model crashing in the 2xPI configuration, or producing noticeably bad values in its history outputs? This particular error indicates that the downward moisture flux in CAM at a given grid point is larger than the moisture content of the lowest model level. However, this does occasionally happen even in a normal model run, and the model adjusts for this by converting the excess latent heat flux to sensible heat. So as long as the model appears to be running ok (and producing reasonable output), and the QNEG4 warnings aren't filling up your log files, then I wouldn't worry too much about this particular error message.

That being said, if the model is crashing or producing bad values, then please include your log files in your reply and we will try and help you figure out what the problem is. I am also transferring this particular thread to the "CESM Community Projects" forum, which should notify people who are experts in the large ensemble (LENS) simulations, in case their expertise is needed.

Hope that helps, and have a great day!

Jesse
 
Hi Jesse,

Thanks for your reply! The model is crashing and I cannot continue the run. Log files are in:

/glade/scratch/im2527/2xCO2.B1850LENS.n21.f09_g16/run/ice.log.200922-091921
/glade/scratch/im2527/2xCO2.B1850LENS.n21.f09_g16/run/ice.log.200922-091921
/glade/scratch/im2527/2xCO2.B1850LENS.n21.f09_g16/run/rof.log.200922-091921
/glade/scratch/im2527/2xCO2.B1850LENS.n21.f09_g16/run/cpl.log.200922-091921
/glade/scratch/im2527/2xCO2.B1850LENS.n21.f09_g16/run/ocn.log.200922-091921
/glade/scratch/im2527/2xCO2.B1850LENS.n21.f09_g16/run/lnd.log.200922-091921
/glade/scratch/im2527/2xCO2.B1850LENS.n21.f09_g16/run/atm.log.200922-091921
/glade/scratch/im2527/2xCO2.B1850LENS.n21.f09_g16/run/cesm.log.200922-091921

Let me know if you need anything else.

Have a great day!

Ivan Mitevski
 

nusbaume

Jesse Nusbaumer
CSEG and Liaisons
Staff member
Hi Ivan,

It does look like the specific humidity is going negative in CAM somewhere, but that shouldn't result in a seg fault. If possible, I would recommend re-building and running the simulation in debug mode (i.e. set DEBUG to TRUE in env_build.xml). This should produce a proper traceback which will allow one to see exactly where the seg fault is occurring in the model. I should note though that the model will run quite slowly, so I might recommend running the model like normal for the first 14 years or so, and then creating a branch run off of that simulation where the DEBUG flag is turned on. Hopefully then the additional output will provide the info needed to track down the specific issue.

Thanks, and good luck!

Jesse
 
Hi Jesse,

I set DEBUG to TRUE, started a run for 3 days, model crashed, and here are the logs:

/glade/scratch/im2527/2xCO2.B1850LENS.n21.f09_g16/run/ice.log.200923-223246
/glade/scratch/im2527/2xCO2.B1850LENS.n21.f09_g16/run/rof.log.200923-223246
/glade/scratch/im2527/2xCO2.B1850LENS.n21.f09_g16/run/cpl.log.200923-223246
/glade/scratch/im2527/2xCO2.B1850LENS.n21.f09_g16/run/ocn.log.200923-223246
/glade/scratch/im2527/2xCO2.B1850LENS.n21.f09_g16/run/atm.log.200923-223246
/glade/scratch/im2527/2xCO2.B1850LENS.n21.f09_g16/run/lnd.log.200923-223246
/glade/scratch/im2527/2xCO2.B1850LENS.n21.f09_g16/run/cesm.log.200923-223246

Also, the case where I do the runs lives here (if you need it):
/glade/work/im2527/runs/2xCO2.B1850LENS.n21.f09_g16/

Thanks!

Ivan Mitevski
 

nusbaume

Jesse Nusbaumer
CSEG and Liaisons
Staff member
Hi Ivan,

It looks like the model is crashing with a floating point exception when trying to take the square root of temperature when calculating chemical reaction rates. Given that it occurred at a different time period than your original crash the two errors may not be related, which could make it hard to debug.

As a sanity check, what is the result if you run with NTHRDS_XXX=1 for all components? Sometimes changing the PE-layout can change the values enough to avoid certain errors without compromising the scientific results.

Also, does this error occur when using a different LENS member? I realize you may have scientific reasons to choose this particular member, but if it runs perfectly fine with a different LENS member then it could indicate that the errror(s) are a result of the initial conditions you are using.

Anyways, trying those two different tests may help resolve your issue, although if not then it may require more serious debugging through the use of write statements and other methods to see where exactly the temperature is going bad (at least for this particular error). Sadly CESM1.1 is no longer officially supported, so my ability to do any rigorous debugging for you will be limited.

Thanks, and sorry about the potentially bad news!

Jesse
 
Hi Jesse,

Setting NTHRDS_XXX=1 worked! Model runs without crashing, and output seems reasonable!

By different LENS member you mean the model tags (e.g. here I use n21)? If so, I did use model tag n16, but AMOC output was bad, then I learned that some pop2 fixes have been added with newer model tags, so I switched to n21.

Thank you very much!!

Ivan Mitevski
 
Top