Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Running error of an CESM2 E1850l45TEST compset on Stampede2-knl

Hello:

I'm trying to run the CESM2 coupled to a slab ocean model (compset: E1850L45TEST) case on Stampede2.

The test run with the same settings worked fine on Cheyenne few months ago.

However, the simulation seems to encounter errors when it was initializing ....

The error message looks like... (also see the attached .log file, start from line ~1197)

Reading setup_nml
Reading grid_nml
Reading tracer_nml
Reading thermo_nml
Reading dynamics_nml
Reading shortwave_nml
Reading ponds_nml
Reading forcing_nml
Reading zbgc_nml
MCT::m_Router::initp_: GSMap indices not increasing...Will correct
MCT::m_Router::initp_: RGSMap indices not increasing...Will correct
MCT::m_Router::initp_: RGSMap indices not increasing...Will correct
MCT::m_Router::initp_: GSMap indices not increasing...Will correct
MCT::m_Router::initp_: GSMap indices not increasing...Will correct
MCT::m_Router::initp_: RGSMap indices not increasing...Will correct
MCT::m_Router::initp_: RGSMap indices not increasing...Will correct
MCT::m_Router::initp_: GSMap indices not increasing...Will correct
MCT::m_Router::initp_: GSMap indices not increasing...Will correct
MCT::m_Router::initp_: RGSMap indices not increasing...Will correct
MCT::m_Router::initp_: RGSMap indices not increasing...Will correct
MCT::m_Router::initp_: GSMap indices not increasing...Will correct
(seq_domain_areafactinit) : min/max mdl2drv 1.00000000000000 1.00000000000000 areafact_o_OCN
(seq_domain_areafactinit) : min/max drv2mdl 1.00000000000000 1.00000000000000 areafact_o_OCN
(seq_domain_areafactinit) : min/max mdl2drv 0.999565346641447 1.00000000000000 areafact_i_ICE
(seq_domain_areafactinit) : min/max drv2mdl 1.00000000000000 1.00043484236425 areafact_i_ICE
MEMKIND_FATAL: [memkind_hugetlb] Failed to create arena map (error code:11).
.....


Image PC Routine Line Source
cesm.exe 000000000321D2EA Unknown Unknown Unknown
libpthread-2.17.s 00002ACBFB16C5D0 Unknown Unknown Unknown
libc-2.17.so 00002ACBFB6B1207 gsignal Unknown Unknown
libc-2.17.so 00002ACBFB6B28F8 abort Unknown Unknown
libmemkind.so.0.0 00002ACC9701DFD7 Unknown Unknown Unknown
libmemkind.so.0.0 00002ACC9701E247 Unknown Unknown Unknown
libpthread-2.17.s 00002ACBFB169E40 pthread_once Unknown Unknown
libmemkind.so.0.0 00002ACC9701E4AA memkind_posix_mem Unknown Unknown
libmkl_core.so 00002ACBF7EB84A8 Unknown Unknown Unknown
libmkl_core.so 00002ACBF7EB64B8 mkl_serv_allocate Unknown Unknown
libmkl_intel_lp64 00002ACBF5B9106D dgbsv Unknown Unknown
cesm.exe 00000000020FFC4A lapack_wrap_mp_ba 652 lapack_wrap.F90
cesm.exe 00000000020D6954 advance_xm_wpxp_m 1904 advance_xm_wpxp_module.F90
cesm.exe 00000000020CFDAB advance_xm_wpxp_m 674 advance_xm_wpxp_module.F90
cesm.exe 00000000020A1F1D advance_clubb_cor 1957 advance_clubb_core_module.F90
cesm.exe 0000000001A79F11 clubb_intr_mp_clu 1885 clubb_intr.F90
cesm.exe 00000000006E40F1 physpkg_mp_tphysb 2072 physpkg.F90
cesm.exe 00000000006E0AC6 physpkg_mp_phys_r 1054 physpkg.F90
libiomp5.so 00002ACBF9990C53 __kmp_invoke_micr Unknown Unknown
libiomp5.so 00002ACBF9960357 Unknown Unknown Unknown
libiomp5.so 00002ACBF995F9D5 Unknown Unknown Unknown
libiomp5.so 00002ACBF9991133 Unknown Unknown Unknown
libpthread-2.17.s 00002ACBFB164DD5 Unknown Unknown Unknown
libc-2.17.so 00002ACBFB778EAD clone Unknown Unknown
forrtl: error (76): Abort trap signal
.....

Stack trace terminated abnormally.
*** longjmp causes uninitialized stack frame ***: /scratch/05409/ccchang3/E1850_knl/bld/cesm.exe terminated
*** longjmp causes uninitialized stack frame ***: /scratch/05409/ccchang3/E1850_knl/bld/cesm.exe terminated
*** longjmp causes uninitialized stack frame ***: /scratch/05409/ccchang3/E1850_knl/bld/cesm.exe terminated
*** longjmp causes uninitialized stack frame ***: /scratch/05409/ccchang3/E1850_knl/bld/cesm.exe terminated
*** longjmp causes uninitialized stack frame ***: /scratch/05409/ccchang3/E1850_knl/bld/cesm.exe terminated
*** longjmp causes uninitialized stack frame ***: /scratch/05409/ccchang3/E1850_knl/bld/cesm.exe terminated
*** longjmp causes uninitialized stack frame ***: /scratch/05409/ccchang3/E1850_knl/bld/cesm.exe terminated
*** longjmp causes uninitialized stack frame ***: /scratch/05409/ccchang3/E1850_knl/bld/cesm.exe terminated
======= Backtrace: =========
/lib64/libc.so.6(__fortify_fail+0x37)[0x2b7e54cf39e7]
forrtl: error (76): Abort trap signal
Image PC Routine Line Source
cesm.exe 000000000321D2EA Unknown Unknown Unknown
libpthread-2.17.s 00002AED7BAE65D0 Unknown Unknown Unknown
libc-2.17.so 00002AED7C02B207 gsignal Unknown Unknown
libc-2.17.so 00002AED7C02C8F8 abort Unknown Unknown
libmemkind.so.0.0 00002AEE2E00DFD7 Unknown Unknown Unknown
libmemkind.so.0.0 00002AEE2E00E247 Unknown Unknown Unknown
libpthread-2.17.s 00002AED7BAE3E40 pthread_once Unknown Unknown
libmemkind.so.0.0 00002AEE2E00E4AA memkind_posix_mem Unknown Unknown
libmkl_core.so 00002AED788324A8 Unknown Unknown Unknown
libmkl_core.so 00002AED788304B8 mkl_serv_allocate Unknown Unknown
libmkl_intel_lp64 00002AED7650B06D dgbsv Unknown Unknown
cesm.exe 00000000020FFC4A lapack_wrap_mp_ba 652 lapack_wrap.F90
cesm.exe 00000000020D6954 advance_xm_wpxp_m 1904 advance_xm_wpxp_module.F90
cesm.exe 00000000020CFDAB advance_xm_wpxp_m 674 advance_xm_wpxp_module.F90
cesm.exe 00000000020A1F1D advance_clubb_cor 1957 advance_clubb_core_module.F90
cesm.exe 0000000001A79F11 clubb_intr_mp_clu 1885 clubb_intr.F90
cesm.exe 00000000006E40F1 physpkg_mp_tphysb 2072 physpkg.F90
cesm.exe 00000000006E0AC6 physpkg_mp_phys_r 1054 physpkg.F90
libiomp5.so 00002AED7A30AC53 __kmp_invoke_micr Unknown Unknown
libiomp5.so 00002AED7A2DA357 Unknown Unknown Unknown
libiomp5.so 00002AED7A2DB413 __kmp_fork_call Unknown Unknown
libiomp5.so 00002AED7A2B1E2A __kmpc_fork_call Unknown Unknown
cesm.exe 00000000006E0729 physpkg_mp_phys_r 1038 physpkg.F90
cesm.exe 00000000004FAD42 cam_comp_mp_cam_r 258 cam_comp.F90
cesm.exe 00000000004F38FE atm_comp_mct_mp_a 287 atm_comp_mct.F90
cesm.exe 0000000000435188 component_mod_mp_ 267 component_mod.F90
cesm.exe 0000000000429B1F cime_comp_mod_mp_ 2010 cime_comp_mod.F90
cesm.exe 000000000043231E MAIN__ 114 cime_driver.F90
cesm.exe 000000000041637E Unknown Unknown Unknown

Stack trace terminated abnormally.
*** longjmp causes uninitialized stack frame ***: /scratch/05409/ccchang3/E1850_knl/bld/cesm.exe terminated


Can anyone help me with this? I'm very appreciated....

Thanks.

Jay
 

jedwards

CSEG and Liaisons
Staff member
I don't recognize this error - you can get more information by turning on the DEBUG flags
./xmlchange DEBUG=TRUE
./case.build --clean-all
./case.build
 
Top