ice: Vertical thermo error

ice: Vertical thermo error

Hi all,

I am running a CESM 1.0.6 B1850CN case, which crashed at Model Year 391. Below is the error message I got just before the line "359:forrtl: error (78): process killed (SIGTERM)". I found someone else on the forum who encountered similar error and solved it by reducing the timestep of the ocean component. 

I wonder should I do the same? Why not the timestep of the ice component?

Thanks a lot,



210: Thermo iteration does not converge,istep1, my_task, i, j:      183277

 210:         210           5          71

 210: Ice thickness:  0.304990231284917

 210: Snow thickness:  0.000000000000000E+000

 210: dTsf, Tsf_errmax:  8.681028118573408E-012  5.000000000000000E-004

 210: Tsf:  0.000000000000000E+000

 210: fsurf:   7.97436120047061

 210: fcondtop, fcondbot, fswint   7.97436120080571        16.8896766960386

 210:   19.2628045380274

 210: fswsfc, fswthrun   28.9156069993825        27.6167851718069

 210: Flux conservation error =  3.551186011918617E-010

 210: Internal snow absorption:

 210:  0.000000000000000E+000

 210: Internal ice absorption:

 210:   10.9082961390655        4.73599459455542        2.30521393613914

 210:   1.31329986826734

 210: Initial snow temperatures:

 210:  0.000000000000000E+000

 210: Initial ice temperatures:

 210: -0.191802146393597      -0.737987439304102       -1.09846369089355

 210:  -1.42274118270238

 210: Final snow temperatures:

 210:  0.000000000000000E+000

 210: Final ice temperatures:

 210: -0.191205075489690      -0.735301673292061       -1.09701825916023

 210:  -1.42964223580686

 210: istep1, my_task, iblk =      183277         210           1

 210: Global block:         211

 210: Global i and j:         204         326

 210: Lat, Lon:   61.2282675113457       -166.461789478145

 210:(shr_sys_abort) ERROR: ice: Vertical thermo error

 210:(shr_sys_abort) WARNING: calling shr_mpi_abort() and stopping

INFO: 0031-251  task 210 exited: rc=-11


This is normally indicative of a problem somewhere else in the system, but it is probably worth adding an FAQ on this. Here are the steps I would suggest:

1. Turn on frequent history output from the coupler starting from the last restart. This is HIST_OPTION and HIST_N depending on the version of the code. Look carefully at all of the fields going into the CICE model.

2. If everything makes physical sense going into the ice, then you can see if everything makes physical sense within the ice using the following CICE namelist changes:

print_points = .true.

latpnt = latn, lats

lonpnt = lonn, lons

diagfreq = 1

where latn/lonn and lats/lons are the latitudes and longitudes of two points. One is northern hemisphere, one is south. Change one set of these values to correspond to the values from your error output. Rerun the model from the last restart.

3. If everything there looks ok, you can attempt to increase the iterations in the thermodynamics (ice_therm_vertical.F90). Increase nitermax to 200 in the source code module ice_therm_vertical.F90 (copied into SourceMods/src.cice) and rerun from the last restart.

4. This does not usually help. The final thing to try is decreasing the thermodynamic timestep in the CICE model. This can only be done by changing the coupling interval with the atmosphere (ATM_NCPL/ICE_NCPL). Increase these values. Note, that you cannot do a 'branch' or 'continue' run with CAM and change these values. So, it will require a new run with a 'hybrid' start. If you are using the DATM, you can change these in all types of runs.




These errors can be a result of many things including weird fluxes passed through the coupler, so changing the ocean timestep may not help in your case. I'd suggest outputting frequent coupler history files immediately before the error occurs to see if there are any weird fluxes at the location of the crash and then from there you may be able to better determine how to fix this. The point you want to look at is identified as: 

 210: Global i and j:         204         326


