Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

ice: Vertical thermo error

Hi all,I am running a CESM 1.0.6 B1850CN case, which crashed at Model Year 391. Below is the error message I got just before the line "359:forrtl: error (78): process killed (SIGTERM)". I found someone else on the forum who encountered similar error and solved it by reducing the timestep of the ocean component. I wonder should I do the same? Why not the timestep of the ice component?Thanks a lot,Shineng 210: Thermo iteration does not converge,istep1, my_task, i, j:      183277 210:         210           5          71 210: Ice thickness:  0.304990231284917 210: Snow thickness:  0.000000000000000E+000 210: dTsf, Tsf_errmax:  8.681028118573408E-012  5.000000000000000E-004 210: Tsf:  0.000000000000000E+000 210: fsurf:   7.97436120047061 210: fcondtop, fcondbot, fswint   7.97436120080571        16.8896766960386 210:   19.2628045380274 210: fswsfc, fswthrun   28.9156069993825        27.6167851718069 210: Flux conservation error =  3.551186011918617E-010 210: Internal snow absorption: 210:  0.000000000000000E+000 210: Internal ice absorption: 210:   10.9082961390655        4.73599459455542        2.30521393613914 210:   1.31329986826734 210: Initial snow temperatures: 210:  0.000000000000000E+000 210: Initial ice temperatures: 210: -0.191802146393597      -0.737987439304102       -1.09846369089355 210:  -1.42274118270238 210: Final snow temperatures: 210:  0.000000000000000E+000 210: Final ice temperatures: 210: -0.191205075489690      -0.735301673292061       -1.09701825916023 210:  -1.42964223580686 210: istep1, my_task, iblk =      183277         210           1 210: Global block:         211 210: Global i and j:         204         326 210: Lat, Lon:   61.2282675113457       -166.461789478145 210:(shr_sys_abort) ERROR: ice: Vertical thermo error 210:(shr_sys_abort) WARNING: calling shr_mpi_abort() and stoppingINFO: 0031-251  task 210 exited: rc=-11
 

dbailey

CSEG and Liaisons
Staff member
This is normally indicative of a problem somewhere else in the system, but it is probably worth adding an FAQ on this. Here are the steps I would suggest:1. Turn on frequent history output from the coupler starting from the last restart. This is HIST_OPTION and HIST_N depending on the version of the code. Look carefully at all of the fields going into the CICE model.2. If everything makes physical sense going into the ice, then you can see if everything makes physical sense within the ice using the following CICE namelist changes:print_points = .true.latpnt = latn, latslonpnt = lonn, lonsdiagfreq = 1where latn/lonn and lats/lons are the latitudes and longitudes of two points. One is northern hemisphere, one is south. Change one set of these values to correspond to the values from your error output. Rerun the model from the last restart.3. If everything there looks ok, you can attempt to increase the iterations in the thermodynamics (ice_therm_vertical.F90). Increase nitermax to 200 in the source code module ice_therm_vertical.F90 (copied into SourceMods/src.cice) and rerun from the last restart.

4. This does not usually help. The final thing to try is decreasing the thermodynamic timestep in the CICE model. This can only be done by changing the coupling interval with the atmosphere (ATM_NCPL/ICE_NCPL). Increase these values. Note, that you cannot do a 'branch' or 'continue' run with CAM and change these values. So, it will require a new run with a 'hybrid' start. If you are using the DATM, you can change these in all types of runs.Dave
 

dbailey

CSEG and Liaisons
Staff member
This is normally indicative of a problem somewhere else in the system, but it is probably worth adding an FAQ on this. Here are the steps I would suggest:1. Turn on frequent history output from the coupler starting from the last restart. This is HIST_OPTION and HIST_N depending on the version of the code. Look carefully at all of the fields going into the CICE model.2. If everything makes physical sense going into the ice, then you can see if everything makes physical sense within the ice using the following CICE namelist changes:print_points = .true.latpnt = latn, latslonpnt = lonn, lonsdiagfreq = 1where latn/lonn and lats/lons are the latitudes and longitudes of two points. One is northern hemisphere, one is south. Change one set of these values to correspond to the values from your error output. Rerun the model from the last restart.3. If everything there looks ok, you can attempt to increase the iterations in the thermodynamics (ice_therm_vertical.F90). Increase nitermax to 200 in the source code module ice_therm_vertical.F90 (copied into SourceMods/src.cice) and rerun from the last restart.

4. This does not usually help. The final thing to try is decreasing the thermodynamic timestep in the CICE model. This can only be done by changing the coupling interval with the atmosphere (ATM_NCPL/ICE_NCPL). Increase these values. Note, that you cannot do a 'branch' or 'continue' run with CAM and change these values. So, it will require a new run with a 'hybrid' start. If you are using the DATM, you can change these in all types of runs.Dave
 

duvivier

CSEG and Liaisons
Staff member
Hi,These errors can be a result of many things including weird fluxes passed through the coupler, so changing the ocean timestep may not help in your case. I'd suggest outputting frequent coupler history files immediately before the error occurs to see if there are any weird fluxes at the location of the crash and then from there you may be able to better determine how to fix this. The point you want to look at is identified as:  210: Global i and j:         204         326Alice
 

duvivier

CSEG and Liaisons
Staff member
Hi,These errors can be a result of many things including weird fluxes passed through the coupler, so changing the ocean timestep may not help in your case. I'd suggest outputting frequent coupler history files immediately before the error occurs to see if there are any weird fluxes at the location of the crash and then from there you may be able to better determine how to fix this. The point you want to look at is identified as:  210: Global i and j:         204         326Alice
 
Top