Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

"Floating point exception" after 6 months of run

Hello; I have set up a simulation using BG1850CN compset with a T31_g37_gl10 resolution. The model starts running for 6 months, but right at the end of the 6th month (0001-06-30), it crashes. The cesm.log file contains:...[cl338:02867] *** Process received signal ***
[cl338:02867] Signal: Floating point exception (8)
[cl338:02867] Signal code: Floating point underflow (5)
[cl338:02867] Failing at address: 0x193e593
[cl338:02867] [ 0] /lib64/libpthread.so.0 [0x3c8a00e4c0]
[cl338:02867] [ 1] /net/glacdyn/glacdyn/2/taimaz/CESM/cesm1_2_0/CASE/T31inc/bld/cesm.exe(__powr8i4+0x33) [0x193e593]
[cl338:02867] [ 2] /net/glacdyn/glacdyn/2/taimaz/CESM/cesm1_2_0/CASE/T31inc/bld/cesm.exe(snicarmod_mp_snowage_grain_+0x1252) [0xf2d402]
[cl338:02867] [ 3] /net/glacdyn/glacdyn/2/taimaz/CESM/cesm1_2_0/CASE/T31inc/bld/cesm.exe(clm_driver_mp_clm_drv_+0x45ce) [0xc7887e]
[cl338:02867] [ 4] /net/glacdyn/glacdyn/2/taimaz/CESM/cesm1_2_0/CASE/T31inc/bld/cesm.exe(lnd_comp_mct_mp_lnd_run_mct_+0x1147) [0xc5d1c7]
[cl338:02867] [ 5] /net/glacdyn/glacdyn/2/taimaz/CESM/cesm1_2_0/CASE/T31inc/bld/cesm.exe(ccsm_comp_mod_mp_ccsm_run_+0x301b) [0x512d3b]
[cl338:02867] [ 6] /net/glacdyn/glacdyn/2/taimaz/CESM/cesm1_2_0/CASE/T31inc/bld/cesm.exe(MAIN__+0x6d) [0x536dcd]
[cl338:02867] [ 7] /net/glacdyn/glacdyn/2/taimaz/CESM/cesm1_2_0/CASE/T31inc/bld/cesm.exe(main+0x3c) [0x50fcfc]
[cl338:02867] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3c8941d974]
[cl338:02867] [ 9] /net/glacdyn/glacdyn/2/taimaz/CESM/cesm1_2_0/CASE/T31inc/bld/cesm.exe [0x50fc09]
[cl338:02867] *** End of error message ***
forrtl: error (78): process killed (SIGTERM) ... Another run with exactly same configurations is runnign in another cluster with no error, simulating 30 years successfully, so there should be no problem with the resolution and compset combination.Any idea?
 

jedwards

CSEG and Liaisons
Staff member
Are the two clusters using the same compiler and compiler version?   I would suspect a compiler issue.
 

jedwards

CSEG and Liaisons
Staff member
Are the two clusters using the same compiler and compiler version?   I would suspect a compiler issue.
 
No. The one with problem is using intel/13.0, and the other one is using IBM compilers. But again, the intel one could pass some test runs and also 6 months here successfully.
 
No. The one with problem is using intel/13.0, and the other one is using IBM compilers. But again, the intel one could pass some test runs and also 6 months here successfully.
 
Top