"Floating point exception" after 6 months of run

I have set up a simulation using BG1850CN compset with a T31_g37_gl10 resolution. The model starts running for 6 months, but right at the end of the 6th month (0001-06-30), it crashes. The cesm.log file contains:


[cl338:02867] *** Process received signal ***
[cl338:02867] Signal: Floating point exception (8)
[cl338:02867] Signal code: Floating point underflow (5)
[cl338:02867] Failing at address: 0x193e593
[cl338:02867] [ 0] /lib64/ [0x3c8a00e4c0]
[cl338:02867] [ 1] /net/glacdyn/glacdyn/2/taimaz/CESM/cesm1_2_0/CASE/T31inc/bld/cesm.exe(__powr8i4+0x33) [0x193e593]
[cl338:02867] [ 2] /net/glacdyn/glacdyn/2/taimaz/CESM/cesm1_2_0/CASE/T31inc/bld/cesm.exe(snicarmod_mp_snowage_grain_+0x1252) [0xf2d402]
[cl338:02867] [ 3] /net/glacdyn/glacdyn/2/taimaz/CESM/cesm1_2_0/CASE/T31inc/bld/cesm.exe(clm_driver_mp_clm_drv_+0x45ce) [0xc7887e]
[cl338:02867] [ 4] /net/glacdyn/glacdyn/2/taimaz/CESM/cesm1_2_0/CASE/T31inc/bld/cesm.exe(lnd_comp_mct_mp_lnd_run_mct_+0x1147) [0xc5d1c7]
[cl338:02867] [ 5] /net/glacdyn/glacdyn/2/taimaz/CESM/cesm1_2_0/CASE/T31inc/bld/cesm.exe(ccsm_comp_mod_mp_ccsm_run_+0x301b) [0x512d3b]
[cl338:02867] [ 6] /net/glacdyn/glacdyn/2/taimaz/CESM/cesm1_2_0/CASE/T31inc/bld/cesm.exe(MAIN__+0x6d) [0x536dcd]
[cl338:02867] [ 7] /net/glacdyn/glacdyn/2/taimaz/CESM/cesm1_2_0/CASE/T31inc/bld/cesm.exe(main+0x3c) [0x50fcfc]
[cl338:02867] [ 8] /lib64/ [0x3c8941d974]
[cl338:02867] [ 9] /net/glacdyn/glacdyn/2/taimaz/CESM/cesm1_2_0/CASE/T31inc/bld/cesm.exe [0x50fc09]
[cl338:02867] *** End of error message ***
forrtl: error (78): process killed (SIGTERM)




Another run with exactly same configurations is runnign in another cluster with no error, simulating 30 years successfully, so there should be no problem with the resolution and compset combination.

Any idea?


Are the two clusters using the same compiler and compiler version?   I would suspect a compiler issue.

CESM Software Engineer


No. The one with problem is using intel/13.0, and the other one is using IBM compilers. But again, the intel one could pass some test runs and also 6 months here successfully.

