potential temperature problem that crashes model in ice component

bates · Oct 28, 2014

Hi,
We have a single forcing run, all forcings for the 20th Century except for volcanoes, that is crashing with the following error in the ccsm.log file: Starting thermo, Tsn < Tmin
Tsn= -101.503840413633
Tmin= -100.000000000000
istep1, my_task, i, j: 1811 754 4 12
qsn -181041641.943549
istep1, my_task, iblk = 1811 754 2
Global block: 1510
Global i and j: 188 379
Lat, Lon: 82.4263016722968 -57.4002116095398
(shr_sys_abort) ERROR: ice: Vertical thermo error
(shr_sys_abort) WARNING: calling shr_mpi_abort() and stopping
forrtl: severe (174): SIGSEGV, segmentation fault occurred This error has been discussed in the forum before, but with no resolution: https://bb.cgd.ucar.edu/run-ccsm4-cesm105-code-base I spoke with Dave B. since it appears as an error in the ice model, but the error is actually saying that the snow on the sea ice is going below -100C, which he says likely points to the atmsophere passing unrealistic temperatures to the ice model. The run is on hopper. The error has been replicated twice.
Log files with the error here: /global/scratch2/sd/adrianne/archive/b.e10.B20NoVOLCC5CN.f09_g16.004The run directory with Jan monthly means (the model crashes on Feb 7) is here: /global/scratch2/sd/adrianne/b.e10.B20NoVOLCC5CN.f09_g16.004/runThe specific log files to look at are *.log.141010-194501 (Oct. 11), *.log.141012-023920 (Oct. 12), and *.log.141013-095314 (Oct. 13) in the /archive direcotry. I did look at T max as printed in the atm.log file and notice that it drops to 150K right when the model crashes. Typical min values range from 160-178 within the log files we have. I have checked the restart files and the monthly data that has been written so far, and I don't see evidence of really cold temperatures in this area. The lat/lon point to a spot on the northern edge of Greenland. We ran 2 other ensemble members with this exact same setup and those ran to completion fine.Thanks for the help,Susan

bates · Nov 12, 2014

I just noticed an error I had not noticed before. This occurs just before the error I listed above. The lat and lon match exactly with the crash in the ice model.
QNEG4 WARNING from TPHYSAC Max possible LH flx exceeded at 1 points. , Worst excess = -1.2399E-08, lchnk = ***, i = 3, same as indices lat
= 184, lon = 243
I've never seen a QNEG4 error before. Does this indicate something to someone?

rneale · Nov 12, 2014

I wonder if this is similar to the radiation error Ben picked up. If LHFLX exceeds a mximum it may be because TS is incredibly high.Could try puting back in the exponential calculations in RRTM. Ben could help.

bates · Nov 12, 2014

Again, I have even more information. The QNEG4 error description is this (with help from S. Santos):
"Check if moisture flux into the ground is exceeding the total moisture content of the lowest model layer (creating negative moisture values). If so, then subtract the excess from the moisture and latent heat fluxes and add it to the sensible heat flux."

So the surface model (I guess in this case CICE) is trying to suck more moisture out of the air than is actually present, and so CAM has to push back by forcing the moisture to evaporate again, to avoid having negative water vapor. Which cools the surface.

It's not obvious who is "at fault" here. Is CICE going haywire, and CAM is lowering its temperature while trying to deal with an excessive qflx? Or does CAM produce bad temperature, confusing CICE into producing a bad flux, which causes CAM to lower temperature further to fix the flux?Does anyone have any thoughts about what to do?Thanks,Susan

santos · Nov 12, 2014

I think that the error message for QNEG4 is potentially misleading. The problem is that the qflx is too high, and we run out of water vapor (mass) to satisfy the desired flux of the surface model (here CICE). This is checked after coupling, i.e. the other model has already run and decided how much water it wants to take, but CAM has to push back because it's just too much. We just accept the small violation of conservation of mass with a warning. But to conserve energy while also keeping the latent heat flux consistent with constituent flux, CAM has to adjust both the latent and sensible heat fluxes.

bates · Nov 12, 2014

I thought the same thing Rich. Gary is in the process of making plots of the Tmin variables. I'll ask him to make Tmax variable plots too.Susan

dbailey · Nov 12, 2014

This is a tough chicken or egg, but I suspect it is not originating in CICE. Have you changed the land/ocean mask? How did you initialize the components and how far into the run is this? You can try a hybrid run and change ATM_NCPL to 96 (assuming it is currently 48). Also, I would write out daily coupler history files from the last restart to see where it goes wrong first. In env_run.xml set HIST_OPTION to ndays and HIST_N to 1.Dave

bates · Nov 12, 2014

Thanks Dave. We have not changed the land/ocean mask. This is one of our single-forcing runs, initialized from an 1850 control (b40_1850_1d_b08c5cn_138j), in which all forcings vary except for volcanoes. So, it's basically a 20th Century run without volcanoes, except that we use an 1850 compset and add the varying forcing in the namelists. The run crashes on Feb 7, 1914, so it's been running for approximately 64 years. We have two other ensemble members initialized from the same 1850 run but at different times that ran to completion. We can try changing ATM_NCPL as you suggest and write out coupler history files.Susan

bing_pu@jsg_utexas_edu · Oct 12, 2015

Hello Susan,I recently encountered a similar problem with error messages "Starting thermo, Tsn

bates · Oct 15, 2015

Hi Bing,The problem grid points for us were at the northern edge of Greenland. I was told that there can be a convergence error in the ocean model in this area as this is where the lines of longitude are getting very close in this grid configuration. I think that I increased the number of time steps per day in the ocean model to get past the instability. You should then be able to reduce the timestep again.Susan

bing_pu@jsg_utexas_edu · Oct 15, 2015

Thanks a lot, Susan! The information is very helpful!Actually I just start to try a hybrid run with increased coupling time steps. The model works fine so far, and looks like passed the point where it crashed. This may be the solution. I will rerun the whole simulation with increase time steps at the beginning and see how things will go. Bing

potential temperature problem that crashes model in ice component

bates

Member

bates

Member

rneale

Rich Neale

CAM Project Scientist

bates

Member

santos

Member

bates

Member

dbailey

CSEG and Liaisons

bates

Member

bing_pu@jsg_utexas_edu

New Member

bates

Member

bing_pu@jsg_utexas_edu

New Member