When the OCN component POP is configured to run at a processor count which is not a multiple of 8, a segmentation fault may occur.
Symptom:
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
cesm.exe 0000000001E6BA2A pop_spacecurvemod 1690 POP_SpaceCurveMod.F90
cesm.exe 00000000020B77E8 distribution_mp_c 607 distribution.F90
cesm.exe 00000000020B316A distribution_mp_c 139 distribution.F90
cesm.exe 0000000001EDA4A7 domain_mp_init_do 438 domain.F90
cesm.exe 0000000001FCD32E initial_mp_pop_in 253 initial.F90
cesm.exe 0000000001E3854C pop_initmod_mp_po 102 POP_InitMod.F90
cesm.exe 0000000001D5F4EE ocn_comp_mct_mp_o 261 ocn_comp_mct.F90
cesm.exe 000000000042EB77 ccsm_comp_mod_mp_ 1130 ccsm_comp_mod.F90
cesm.exe 000000000043627C MAIN__ 90 ccsm_driver.F90
cesm.exe 0000000000411E2C Unknown Unknown Unknown
libc.so.6 0000003CB701ECDD Unknown Unknown Unknown
cesm.exe 0000000000411D29 Unknown Unknown Unknown
Versions affected:
I tried CESM 1.1.2 and the latest CESM 1.2.1 (rev 61100). Both suffer from the problem.
How to reproduce:
/create_newcase -case tw.r01.B1850.C5CN.f09_g16.032 -compset B1850C5CN -res f09_g16 -mach cartesius
Set processor count for OCN to something else than a multiple of 8. In my case, I used PES = 28.
Build & run the case.
Log and settings attached.
Workaround
Set OCN pes to a multiple of 8, this seems to work.
Proposed solution
From what I heard, there used to be a fatal error in the POP configure script regarding the number of processors in the past [CCSM4]. This check seems to have been removed, a likely cause of the bug. The current segmentation fault is quite nasty as it does not provide hints regarding the origin of the problem. Perhaps the check at configure time needs to be brought back.