Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

potential temperature problem that crashes model in ice component

bates

Member
Hi,
We have a single forcing run, all forcings for the 20th Century except for volcanoes, that is crashing with the following error in the ccsm.log file: Starting thermo, Tsn < Tmin
 Tsn=  -101.503840413633
 Tmin=  -100.000000000000
 istep1, my_task, i, j:        1811         754           4          12
 qsn  -181041641.943549
 istep1, my_task, iblk =        1811         754           2
 Global block:        1510
 Global i and j:         188         379
 Lat, Lon:   82.4263016722968       -57.4002116095398
(shr_sys_abort) ERROR: ice: Vertical thermo error
(shr_sys_abort) WARNING: calling shr_mpi_abort() and stopping
forrtl: severe (174): SIGSEGV, segmentation fault occurred This error has been discussed in the forum before, but with no resolution: https://bb.cgd.ucar.edu/run-ccsm4-cesm105-code-base I spoke with Dave B. since it appears as an error in the ice model, but the error is actually saying that the snow on the sea ice is going below -100C, which he says likely points to the atmsophere passing unrealistic temperatures to the ice model. The run is on hopper. The error has been replicated twice.
Log files with the error here: /global/scratch2/sd/adrianne/archive/b.e10.B20NoVOLCC5CN.f09_g16.004The run directory with Jan monthly means (the model crashes on Feb 7) is here: /global/scratch2/sd/adrianne/b.e10.B20NoVOLCC5CN.f09_g16.004/runThe specific log files to look at are *.log.141010-194501 (Oct. 11), *.log.141012-023920 (Oct. 12), and *.log.141013-095314 (Oct. 13) in the /archive direcotry. I did look at T max as printed in the atm.log file and notice that it drops to 150K right when the model crashes. Typical min values range from 160-178 within the log files we have. I have checked the restart files and the monthly data that has been written so far, and I don't see evidence of really cold temperatures in this area. The lat/lon point to a spot on the northern edge of Greenland. We ran 2 other ensemble members with this exact same setup and those ran to completion fine.Thanks for the help,Susan
 

bates

Member
I just noticed an error I had not noticed before. This occurs just before the error I listed above. The lat and lon match exactly with the crash in the ice model.
QNEG4 WARNING from TPHYSAC  Max possible LH flx exceeded at    1 points. , Worst excess =  -1.2399E-08, lchnk = ***, i =    3, same as indices lat
= 184, lon = 243
I've never seen a QNEG4 error before. Does this indicate something to someone?
 

rneale

Rich Neale
CAM Project Scientist
Staff member
I wonder if this is similar to the radiation error Ben picked up. If LHFLX exceeds a mximum it may be because TS is incredibly high.Could try puting back in the exponential calculations in RRTM. Ben could help.
 

bates

Member
Again, I have even more information. The QNEG4 error description is this (with help from S. Santos):
"Check if moisture flux into the ground is exceeding the total moisture content of the lowest model layer (creating negative moisture values).  If so, then subtract the excess from the moisture and latent heat fluxes and add it to the sensible heat flux."

So the surface model (I guess in this case CICE) is trying to suck more moisture out of the air than is actually present, and so CAM has to push back by forcing the moisture to evaporate again, to avoid having negative water vapor. Which cools the surface.

It's not obvious who is "at fault" here. Is CICE going haywire, and CAM is lowering its temperature while trying to deal with an excessive qflx? Or does CAM produce bad temperature, confusing CICE into producing a bad flux, which causes CAM to lower temperature further to fix the flux?Does anyone have any thoughts about what to do?Thanks,Susan
 

santos

Member
I think that the error message for QNEG4 is potentially misleading. The problem is that the qflx is too high, and we run out of water vapor (mass) to satisfy the desired flux of the surface model (here CICE). This is checked after coupling, i.e. the other model has already run and decided how much water it wants to take, but CAM has to push back because it's just too much. We just accept the small violation of conservation of mass with a warning. But to conserve energy while also keeping the latent heat flux consistent with constituent flux, CAM has to adjust both the latent and sensible heat fluxes.
 

bates

Member
I thought the same thing Rich. Gary is in the process of making plots of the Tmin variables. I'll ask him to make Tmax variable plots too.Susan
 

dbailey

CSEG and Liaisons
Staff member
This is a tough chicken or egg, but I suspect it is not originating in CICE. Have you changed the land/ocean mask? How did you initialize the components and how far into the run is this? You can try a hybrid run and change ATM_NCPL to 96 (assuming it is currently 48). Also, I would write out daily coupler history files from the last restart to see where it goes wrong first. In env_run.xml set HIST_OPTION to ndays and HIST_N to 1.Dave
 

bates

Member
Thanks Dave. We have not changed the land/ocean mask. This is one of our single-forcing runs, initialized from an 1850 control (b40_1850_1d_b08c5cn_138j), in which all forcings vary except for volcanoes. So, it's basically a 20th Century run without volcanoes, except that we use an 1850 compset and add the varying forcing in the namelists. The run crashes on Feb 7, 1914, so it's been running for approximately 64 years. We have two other ensemble members initialized from the same 1850 run but at different times that ran to completion. We can try changing ATM_NCPL as you suggest and write out coupler history files.Susan
 

bates

Member
Hi Bing,The problem grid points for us were at the northern edge of Greenland. I was told that there can be a convergence error in the ocean model in this area as this is where the lines of longitude are getting very close in this grid configuration. I think that I increased the number of time steps per day in the ocean model to get past the instability. You should then be able to reduce the timestep again.Susan
 
Thanks a lot, Susan! The information is very helpful!Actually I just start to try a hybrid run with increased coupling time steps. The model works fine so far, and looks like passed the point where it crashed. This may be the solution. I will rerun the whole simulation with increase time steps at the beginning and see how things will go. Bing  
 
Top