wei_huang2@hpe_com
New Member
Hello There,I am trying to run T31-g37 B1850C5 compset, and run into error: Overflow: Ross Sea Product adjacent mask at global (ij)= 61 8 Overflow: Ross Sea Product adjacent mask at global (ij)= 61 9 Overflow: Ross Sea Product adjacent mask at global (ij)= 61 10*** Error in `../bld/cesm.exe': malloc(): memory corruption (fast): 0x000000001148e240 ****** Error in `../bld/cesm.exe': malloc(): memory corruption (fast): 0x000000001148e250 ****** Error in `../bld/cesm.exe': malloc(): memory corruption (fast): 0x0000000011492810 *** I have check the related topic:https://bb.cgd.ucar.edu/cesm-runtime-error-netcdf-invalid-dimension-id-or-name-glibc-detectedand updated file, spmd_dyn.F90 and still get the same error.I am using Intel compiler, with MPT on a SGI ICE-XA machine.I tried print variable "num_ovf", its value is 4. run with debug on, and get the trace back: MPT: 0x00002aaaac065f19 in waitpid () from /lib64/libpthread.so.0MPT: Missing separate debuginfos, use: debuginfo-install glibc-2.17-157.el7.x86_64 libbitmask-2.0-sgi716r63.rhel73.x86_64 libcpuset-1.0-sgi716r94.rhel73.x86_64 libcxgb3-1.3.1-8.el7.x86_64 libgcc-4.8.5-11.el7.x86_64 libhfi1-0.5-23.el7.x86_64 libibverbs-1.2.1-1.el7.x86_64 libmlx4-1.2.1-1.el7.x86_64 libmlx5-1.2.1-8.el7.x86_64 libmthca-1.0.6-13.el7.x86_64 libnl3-3.2.28-2.el7.x86_64 libnuma-3.0sgi-sgi716r61.rhel73.x86_64 libpsm2-devel-10.2.175-1.x86_64 numatools-2.0-sgi716r146.rhel73.x86_64 xpmem-1.6-sgi716r125.rhel73.x86_64MPT: (gdb) #0 0x00002aaaac065f19 in waitpid () from /lib64/libpthread.so.0MPT: #1 0x00002aaaaba9784c in mpi_sgi_system (command=,MPT: __statbuf=, __fd=) at sig.c:98MPT: #2 MPI_SGI_stacktraceback (header=) at sig.c:339MPT: #3 0x00002aaaaba98354 in first_arriver_handler (signo=6,MPT: stack_trace_sem=0x2aaab8e60500) at sig.c:488MPT: #4 0x00002aaaaba985df in slave_sig_handler (signo=6, siginfo=,MPT: extra=) at sig.c:563MPT: #5 MPT: #6 0x00002aaaac2a81d7 in raise () from /lib64/libc.so.6MPT: #7 0x00002aaaac2a98c8 in abort () from /lib64/libc.so.6MPT: #8 0x00002aaaac2e7f07 in __libc_message () from /lib64/libc.so.6MPT: #9 0x00002aaaac2edda4 in malloc_printerr () from /lib64/libc.so.6MPT: #10 0x00002aaaac2f0dc7 in _int_malloc () from /lib64/libc.so.6MPT: #11 0x00002aaaac2f2fbc in malloc () from /lib64/libc.so.6MPT: #12 0x00000000073897fd in for__get_vm ()MPT: #13 0x0000000007352975 in for__add_to_lf_table ()MPT: #14 0x00000000073c46db in for__open_proc ()MPT: #15 0x00000000073599b2 in for__open_default ()MPT: #16 0x00000000073a4ce4 in for_write_seq_lis ()MPT: #17 0x000000000532fa91 in ovf_utils::ovf_init_groups ()MPT: at /lustre/whuang/wd4cesm1/mpt.T31_g37.B1850.288c.lnd72_ice144_ocn72.tpn36.omp1/bld/ocn/source/ovf_utils.F90:120MPT: #18 0x000000000530d09d in overflows::ovf_hu (hu=..., hum=...)MPT: at /lustre/whuang/wd4cesm1/mpt.T31_g37.B1850.288c.lnd72_ice144_ocn72.tpn36.omp1/bld/ocn/source/overflows.F90:5791MPT: #19 0x00000000053009ae in overflows::ovf_solvers_9pt ()MPT: at /lustre/whuang/wd4cesm1/mpt.T31_g37.B1850.288c.lnd72_ice144_ocn72.tpn36.omp1/bld/ocn/source/overflows.F90:5613MPT: #20 0x00000000051dc082 in overflows::init_overflows3 ()MPT: at /lustre/whuang/wd4cesm1/mpt.T31_g37.B1850.288c.lnd72_ice144_ocn72.tpn36.omp1/bld/ocn/source/overflows.F90:1449MPT: #21 0x0000000005ce9a01 in initial::pop_init_phase1 (errorcode=0)MPT: at /lustre/whuang/wd4cesm1/mpt.T31_g37.B1850.288c.lnd72_ice144_ocn72.tpn36.omp1/bld/ocn/source/initial.F90:345MPT: #22 0x000000000564f16b in pop_initmod::pop_initialize1 (errorcode=0)MPT: at /lustre/whuang/wd4cesm1/mpt.T31_g37.B1850.288c.lnd72_ice144_ocn72.tpn36.omp1/bld/ocn/source/POP_InitMod.F90:102MPT: #23 0x0000000005122cad in ocn_comp_mct::ocn_init_mct (eclock=..., cdata_o=...,MPT: x2o_o=..., o2x_o=..., nlfilename='drv_in', .tmp.NLFILENAME.len_V$2850=6)MPT: at /lustre/whuang/wd4cesm1/mpt.T31_g37.B1850.288c.lnd72_ice144_ocn72.tpn36.omp1/bld/ocn/source/ocn_comp_mct.F90:261MPT: #24 0x000000000043d952 in ccsm_comp_mod::ccsm_init ()MPT: at /store/whuang/CESM/cesm1_2_2_1/models/drv/driver/ccsm_comp_mod.F90:1130MPT: #25 0x00000000004fefb2 in ccsm_driver ()MPT: at /store/whuang/CESM/cesm1_2_2_1/models/drv/driver/ccsm_driver.F90:90MPT: #26 0x0000000000418e9e in main ()MPT: #27 0x00002aaaac294b35 in __libc_start_main () from /lib64/libc.so.6MPT: #28 0x0000000000418da9 in _start ()MPT: (gdb) A debugging session is active.MPT:MPT: Inferior 1 [process 99237] will be detached.MPT:MPT: Quit anyway? (y or n) [answered Y; input not from terminal]MPT: Detaching from program: /proc/99237/exe, process 99237 MPT: -----stack traceback ends-----MPT: On host r1i7n7, Program /lustre/whuang/wd4cesm1/mpt.T31_g37.B1850.288c.lnd72_ice144_ocn72.tpn36.omp1/bld/cesm.exe, Rank 256, Process 99237: Dumping core on signal SIGABRT/SIGIOT(6) into directory /lustre/whuang/wd4cesm1/mpt.T31_g37.B1850.288c.lnd72_ice144_ocn72.tpn36.omp1/runMPT ERROR: MPI_COMM_WORLD rank 256 has terminated without calling MPI_Finalize() aborting jobMPT: Received signal 6 file /lustre/whuang/wd4cesm1/mpt.T31_g37.B1850.288c.lnd72_ice144_ocn72.tpn36.omp1/bld/ocn/source/ovf_utils.F90 has line number: 115 integer (int_kind), pointer :: starts(:) 116 117 logical (log_kind) :: found, comm_master_present 118 real (r8), dimension(:,:,:), pointer :: g_mask !the mask 119 120 write(0, *) 'num_ovf = ', num_ovf 121 122 allocate(ids(num_ovf)) 123 count = 0 124 125 126 127 ! print *, 'MYPROC: ', my_task, 'OVF_INIT_GROUPS ' (Note, I added the write(0, *) .... line.) Thanks in advance for your help! Wei