Sometimes the CICE model produces an error like:
Thermo iteration does not converge
istep1, my_task, i, j:
.... (a number of messages and details)
(shr_sys_abort) ERROR: ice: Vertical thermo error
This is normally indicative of a problem somewhere else in the system. Here are the steps I would suggest:
1. Turn on frequent history output from the coupler starting from the last restart. This is HIST_OPTION and HIST_N depending on the version of the code. Look carefully at all of the fields going into the CICE model.
2. If everything makes physical sense going into the ice, then you can see if everything makes physical sense within the ice using the following CICE namelist changes:
print_points = .true.
latpnt = latn, lats
lonpnt = lonn, lons
diagfreq = 1
where latn/lonn and lats/lons are the latitudes and longitudes of two points. One is northern hemisphere, one is south. Change one set of these values to correspond to the values from your error output, which are given in the error message with the following lines
Global i and j: xxx yyy (the i and j indicies of the crash point)
Lat, lon: xx.xxxxxxxx yy.yyyyyyyyy (the latitude and longitude corresponding to the i,j point identified above)
Then rerun the model from the last restart.
3. If everything there looks ok, you can attempt to increase the iterations in the thermodynamics (ice_therm_vertical.F90 in CESM1 and earlier, and ice_therm_bl99.F90 in CESM2). Increase nitermax to 200 in the source code module ice_therm_vertical.F90 or ice_therm_bl99.F90 (copied into SourceMods/src.cice) and rerun from the last restart. Note this applies only to the older thermodynamics (ktherm == 1).
With the new thermodynamics (ktherm == 2), this is a bit more challenging. If this is a Picard convergence error, one can trying increasing the Picard iterations with nit_max in ice_them_mushy.F90. However, this is also unlikely to help and normally one needs to proceed to step 4.
4. The final thing to try is decreasing the thermodynamic timestep in the CICE model. This can only be done by changing the coupling interval with the atmosphere (ATM_NCPL/ICE_NCPL). Increase these values. Note, that you cannot do a 'branch' or 'continue' run with CAM and change these values. So, it will require a new run with a 'hybrid' start. If you are using the DATM, you can change these in all types of runs. This does not usually help resolve the fundamental problem though.
5. *NEW* With CICE5, there is the possibility of an energy conservation error that is just above 1.0e-3. It is possible that the temperature can converge, but with roundoff errors, the energy might not be conserved to the same tolerance. If the message above shows an energy conservation error very close to 1.0e-3, then one can loosen the tolerance on the energy conservation.
Copy the module ice_therm_vertical.F90 to SourceMods/src.cice. Find the line in ice_therm_vertical.F90 like this:
if (ferr > ferrmax) then
and change it to:
if (ferr > 1.1_dbl_kind*ferrmax) then
Thermo iteration does not converge
istep1, my_task, i, j:
.... (a number of messages and details)
(shr_sys_abort) ERROR: ice: Vertical thermo error
This is normally indicative of a problem somewhere else in the system. Here are the steps I would suggest:
1. Turn on frequent history output from the coupler starting from the last restart. This is HIST_OPTION and HIST_N depending on the version of the code. Look carefully at all of the fields going into the CICE model.
2. If everything makes physical sense going into the ice, then you can see if everything makes physical sense within the ice using the following CICE namelist changes:
print_points = .true.
latpnt = latn, lats
lonpnt = lonn, lons
diagfreq = 1
where latn/lonn and lats/lons are the latitudes and longitudes of two points. One is northern hemisphere, one is south. Change one set of these values to correspond to the values from your error output, which are given in the error message with the following lines
Global i and j: xxx yyy (the i and j indicies of the crash point)
Lat, lon: xx.xxxxxxxx yy.yyyyyyyyy (the latitude and longitude corresponding to the i,j point identified above)
Then rerun the model from the last restart.
3. If everything there looks ok, you can attempt to increase the iterations in the thermodynamics (ice_therm_vertical.F90 in CESM1 and earlier, and ice_therm_bl99.F90 in CESM2). Increase nitermax to 200 in the source code module ice_therm_vertical.F90 or ice_therm_bl99.F90 (copied into SourceMods/src.cice) and rerun from the last restart. Note this applies only to the older thermodynamics (ktherm == 1).
With the new thermodynamics (ktherm == 2), this is a bit more challenging. If this is a Picard convergence error, one can trying increasing the Picard iterations with nit_max in ice_them_mushy.F90. However, this is also unlikely to help and normally one needs to proceed to step 4.
4. The final thing to try is decreasing the thermodynamic timestep in the CICE model. This can only be done by changing the coupling interval with the atmosphere (ATM_NCPL/ICE_NCPL). Increase these values. Note, that you cannot do a 'branch' or 'continue' run with CAM and change these values. So, it will require a new run with a 'hybrid' start. If you are using the DATM, you can change these in all types of runs. This does not usually help resolve the fundamental problem though.
5. *NEW* With CICE5, there is the possibility of an energy conservation error that is just above 1.0e-3. It is possible that the temperature can converge, but with roundoff errors, the energy might not be conserved to the same tolerance. If the message above shows an energy conservation error very close to 1.0e-3, then one can loosen the tolerance on the energy conservation.
Copy the module ice_therm_vertical.F90 to SourceMods/src.cice. Find the line in ice_therm_vertical.F90 like this:
if (ferr > ferrmax) then
and change it to:
if (ferr > 1.1_dbl_kind*ferrmax) then
Last edited: