Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

An erroneous arithmetic operation in SoilTemperatureMod.F90

Status
Not open for further replies.

liliyao

Xinamai
Member
Hi I am running a final CLMBGC spinup simulation using CLM5 from ctsm5.1.dev118 with a sparse grid within the CONUS. The model crashed after more than 100 simulation years due to an "erroneous arithmetic operation":

777: Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.
777:
777: Backtrace for this error:
777: #0 0x7fd3e5ce5dbf in ???
777: #1 0xc70f3f in phasechangeh2osfc
777: at /global/cfs/cdirs/m2702/liliyao/clm5.0_ctsm5.1.dev118/src/biogeophys/SoilTemperatureMod.F90:921
777: #2 0xca686b in __soiltemperaturemod_MOD_soiltemperature
777: at /global/cfs/cdirs/m2702/liliyao/clm5.0_ctsm5.1.dev118/src/biogeophys/SoilTemperatureMod.F90:485
777: #3 0x5d4636 in __clm_driver_MOD_clm_drv
777: at /global/cfs/cdirs/m2702/liliyao/clm5.0_ctsm5.1.dev118/src/main/clm_driver.F90:1231
777: #4 0x58d492 in modeladvance
777: at /global/cfs/cdirs/m2702/liliyao/clm5.0_ctsm5.1.dev118/src/cpl/nuopc/lnd_comp_nuopc.F90:904

The codes around line 921 in SoilTemperatureMod.F90 are:
if (z_avg > 0._r8) then
rho_avg=min(800._r8,h2osno_total(c)/z_avg)
else
rho_avg=200._r8
endif

However, when I restart the simulation using the last restart file from this run, the model runs through the simulation year when it encountered the arithmetic error . After more than 100 simulation years, it crashes again because of the same error, and once again, it runs well if I restart the simulation using the last restart files I saved (I saved the restart file every 20 simulation years). So, I was wondering if it is potentially some kind of accumulation bug? I would appreciate any comments regarding this error. Thank you very much!
 

slevis

Moderator
Staff member
It could be and it seems a bit tricky to figure out. Does it always crash in the same grid cell? To save time while troubleshooting, it may help to run a single-point simulation in the grid cell that crashes. Either way, you may wish to write out all the values in that vicinity of the code and see if any variable's values look different after crashing and then restarting.
 

slevis

Moderator
Staff member
Something else that I remembered:
Make sure that you have run ./manage_externals/checkout_externals before starting the simulation.
I once encountered a similar problem and found that it got fixed when I ran ./manage_externals/checkout_externals
 

liliyao

Xinamai
Member
Thank you for your reply, Sam! May I ask how to track which grid cell causes the model to crash? I did not find this information in the log files (I am using ctsm5.1.dev118). Thanks!
 

slevis

Moderator
Staff member
There is no single way to do that. A brute force approach may include adding write statements in the code where you suspect a problem and see what comes out and then modify the write statements iteratively with each successive attempt.

Another approach may be to consider running the same grid as a full grid, rather than sparse, to see if it still fails. By the way did you add new accumulators to the code?
 

liliyao

Xinamai
Member
Thank you, Sam! I don't know what 'accumulators' means. I think the answer is no; we only added some write statements in the code. I am curious about why it's possible for the sparse grid simulation to crash, but the full grid simulation won't. My understanding is that the difference between the full grid and sparse grid simulation is the number of active grids. For the active grids, the simulation should be the same.
 

slevis

Moderator
Staff member
I mentioned "accumulators" because you originally asked if this may be an "accumulation bug" and it seems irrelevant if you just added some write statements.

You are correct that the model should behave the same with a sparse or a full grid. The fact that it does not indicates a problem, but I do not have enough information to come up with an explanation.
 
Status
Not open for further replies.
Top