Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

cesm1_2_2: cam5 aborts with Intel Compiler 13.0.0 at -O2 optimization level

Hi,I tried to run cesm1_2_2, a FAMIPC5 compset at ne30_g16 resolution, on our computing platform (almost identical to NCAR Yellowstone).We use the Intel Fortran and C/C++ Compiler 13.0.0. With the default optimization level (-O2) the model aborts after few days with the error message:
Code:
28:(shr_sys_abort) ERROR: negative layer thickness.  timestep or remap time too large
The error can change starting from a different January, i.e. starting from January 1st, 1980 instead of January 1st, 1979:
Code:
132: ERROR: shr_assert_in_domain: state%t has invalid value   -122446.304285118

Code:
132:  at location:            6          16

Code:
132: Expected value to be greater than   0.000000000000000E+000

Code:
132:(shr_sys_abort) ERROR: Invalid value produced in physics_state by package radheat.
Decreasing the optimization level (-O1) for CAM seems to solve the problem, at the price of ~20% increase of computing time.The model use MPI and no OpenMP threading.The FV dynamical core (FAMIPC5 at f09_g16) seems not affected.I've used cesm1_1_2 with no problem before. Also cesm1_3 beta doesn't show this behaviour.Did you experience similar problems? Thanks in advance.
 

eaton

CSEG and Liaisons
I am able to successfully run the model configuration you describe on yellowstone using intel-13.0.1 which is the closest version available.  I ran 5 days using the default pe layout of 900 tasks with 2 threads per task.  It seems likely that you're encountering a compiler problem, especially since it goes away with reduced optimization. 
 

santos

Member
I concur. If this behavior was consistent, I would say that the simulation is becoming unstable, possibly due to a bad input file or other namelist settings. But since the issue goes away at lower optimization, it's more likely due to a compiler optimization bug that is triggered in that particular version of CESM. 
 
Thanks for your quick replies.
Yes, this is definitely a compiler issue. Indeed I tracked it down to the advection module in the spectral elements dynamical core, prim_advection_mod.F90 . Compiling this module with -O1 solves the problem. However, I'm not going to investigate it further because I tested a newer compiler version, 13.1.3, which does not show the same problem. 
 

eaton

CSEG and Liaisons
We are successfully running with intel 15.0.1 on yellowstone.  As Sean said we don't have 15.0.2 installed.  
 
Top