Main menu

Navigation

CESM 1.0.5 porting runtime issues

4 posts / 0 new
Last post
gmodica@...
CESM 1.0.5 porting runtime issues

I am porting to an Intel Linux cluster (similar to pleiades_wes), starting with compset X. Compiles OK, but getting the following runtime error:

MPI: MPI_COMM_WORLD rank 3 has terminated without calling MPI_Finalize()
MPI: aborting job
MPI: Received signal 11

I am not sure where to turn next.

Complete ccsm.log file included below. Thank you.

George M.

(seq_io_init) pio init parameters: before nml read
(seq_io_init) pio_stride = -99
(seq_io_init) pio_root = -99
(seq_io_init) pio_typename = nothing
(seq_io_init) pio_numtasks = -99
(seq_io_init) pio_debug_level = 0
pio_async_interface = F
(seq_io_init) pio init parameters: after nml read
(seq_io_init) pio_stride = -1
(seq_io_init) pio_root = 1
(seq_io_init) pio_typename = netcdf
(seq_io_init) pio_numtasks = -1
(seq_io_init) pio init parameters:
(seq_io_init) pio_stride = 4
(seq_io_init) pio_root = 1
(seq_io_init) pio_typename = NETCDF
(seq_io_init) pio_numtasks = 1
(seq_io_init) pio_debug_level = 0
pio_async_interface = F
(seq_comm_setcomm) initialize ID ( 7 GLOBAL ) pelist = 0 3 1 ( npes = 4) ( nthreads = 1)
(seq_comm_setcomm) initialize ID ( 2 ATM ) pelist = 0 3 1 ( npes = 4) ( nthreads = 1)
(seq_comm_setcomm) initialize ID ( 1 LND ) pelist = 0 3 1 ( npes = 4) ( nthreads = 1)
(seq_comm_setcomm) initialize ID ( 4 ICE ) pelist = 0 3 1 ( npes = 4) ( nthreads = 1)
(seq_comm_setcomm) initialize ID ( 5 GLC ) pelist = 0 3 1 ( npes = 4) ( nthreads = 1)
(seq_comm_setcomm) initialize ID ( 3 OCN ) pelist = 0 3 1 ( npes = 4) ( nthreads = 1)
(seq_comm_setcomm) initialize ID ( 6 CPL ) pelist = 0 3 1 ( npes = 4) ( nthreads = 1)
(seq_comm_joincomm) initialize ID ( 8 CPLATM ) join IDs = 6 2 ( npes = 4) ( nthreads = 1)
(seq_comm_joincomm) initialize ID ( 9 CPLLND ) join IDs = 6 1 ( npes = 4) ( nthreads = 1)
(seq_comm_joincomm) initialize ID ( 10 CPLICE ) join IDs = 6 4 ( npes = 4) ( nthreads = 1)
(seq_comm_joincomm) initialize ID ( 11 CPLOCN ) join IDs = 6 3 ( npes = 4) ( nthreads = 1)
(seq_comm_joincomm) initialize ID ( 12 CPLGLC ) join IDs = 6 5 ( npes = 4) ( nthreads = 1)

(seq_comm_printcomms) ID layout : global pes vs local pe for each ID
gpe LND ATM OCN ICE GLC CPL GLOBAL CPLATM CPLLND CPLICE CPLOCN CPLGLC nthrds
--- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------
0 : 0 0 0 0 0 0 0 0 0 0 0 0 1
1 : 1 1 1 1 1 1 1 1 1 1 1 1 1
(seq_io_init) pio init parameters for CPL:
(seq_io_init) pio_stride = 4
(seq_io_init) pio_root = 1
pio iotype is netcdf
(seq_io_init) pio_iotype = 6
(seq_io_init) pio_numtasks = 1
2 : 2 2 2 2 2 2 2 2 2 2 2 2 1
3 : 3 3 3 3 3 3 3 3 3 3 3 3 1

(seq_io_init) pio init parameters for ATM:
(seq_io_init) pio_stride = 4
(seq_io_init) pio_root = 1
pio iotype is netcdf
(seq_io_init) pio_iotype = 6
(seq_io_init) pio_numtasks = 1
(seq_io_init) pio init parameters for ICE:
(seq_io_init) pio_stride = 4
(seq_io_init) pio_root = 1
pio iotype is netcdf
(seq_io_init) pio_iotype = 6
(seq_io_init) pio_numtasks = 1
(seq_io_init) pio init parameters for OCN:
(seq_io_init) pio_stride = 4
(seq_io_init) pio_root = 0
pio iotype is netcdf
(seq_io_init) pio_iotype = 6
(seq_io_init) pio_numtasks = 1
(seq_io_init) pio init parameters for LND:
(seq_io_init) pio_stride = 4
(seq_io_init) pio_root = 1
pio iotype is netcdf
(seq_io_init) pio_iotype = 6
(seq_io_init) pio_numtasks = 1
(seq_io_init) pio init parameters for GLC:
(seq_io_init) pio_stride = 4
(seq_io_init) pio_root = 1
pio iotype is netcdf
(seq_io_init) pio_iotype = 6
(seq_io_init) pio_numtasks = 1
(t_initf) Read in prof_inparm namelist from: drv_in
8 MB memory alloc in MB is 8.00
8 MB memory dealloc in MB is 0.00
Memory block size conversion in bytes is 4096.00
8 MB memory alloc in MB is 8.00
8 MB memory dealloc in MB is 0.00
Memory block size conversion in bytes is 4096.00
8 MB memory alloc in MB is 8.00
8 MB memory dealloc in MB is 0.00
Memory block size conversion in bytes is 4096.00
dead_setNewGrid decomp 2d 1 3456 1 144
25 48
dead_setNewGrid decomp 2d 2 3456 1 144
49 72
dead_setNewGrid decomp 2d 3 3456 1 144
73 96
dead_setNewGrid decomp seg 1 3458 266
dead_setNewGrid decomp seg 2 3458 266
dead_setNewGrid decomp seg 3 3450 266
dead_setNewGrid decomp seg 1 64805 4985
dead_setNewGrid decomp seg 2 64805 4985
dead_setNewGrid decomp seg 3 64785 4985
dead_setNewGrid decomp seg 1 3458 266
dead_setNewGrid decomp seg 2 3458 266
dead_setNewGrid decomp seg 3 3450 266
dead_setNewGrid decomp 2d 1 30720 161 320
1 192
dead_setNewGrid decomp 2d 2 30720 1 160
193 384
dead_setNewGrid decomp 2d 3 30720 161 320
193 384
dead_setNewGrid decomp 2d 1 30720 81 160
1 384
dead_setNewGrid decomp 2d 2 30720 161 240
1 384
dead_setNewGrid decomp 2d 3 30720 241 320
1 384
dead_setNewGrid decomp 2d 1 3456 37 72
1 96
dead_setNewGrid decomp 2d 2 3456 73 108
1 96
dead_setNewGrid decomp 2d 3 3456 109 144
1 96
MPI: MPI_COMM_WORLD rank 3 has terminated without calling MPI_Finalize()
MPI: aborting job
MPI: Received signal 11

AER_GM

jedwards

First please try updating to 1.0.6 we may have already fixed this.   If it's still a problem please let us know what compiler and compiler version.    Did you look in the run directory for a core file?

CESM Software Engineer

gmodica@...

Dear jedwards,

 

Thank you for the reply. I rebuilt an executable using CESM v1.1 (recommended for scientific purposes). Same Intel Linux cluster (similar to pleiades_wes), starting with compset X. Compiles OK (using ifort v11.1.069), but getting the runtime error attached below. No core file produced--at least not in the run directory. 

Regards,

George

MPI: Module xpmem is not loaded.  MPI memory mapping features (single copy

     transfers, one-sided communication and SHMEM functions) are disabled.

     XPMEM must be enabled on your system to use these features.

 

MPI WARNING: error from MPI_SGI_xp_init_slave/xpmem_make (all_mem), rc = -1, errno = 19

             Check that the xpmem module is loaded

(seq_comm_setcomm)  initialize ID (  1 GLOBAL          ) pelist   =     0     0     1 ( npes =     1) ( nthreads =  1)

MPI Error, rank:0, function:MPI_GROUP_RANGE_INCL, Invalid argument

MPI: Global rank 0 is aborting with error code 0.

     Process ID: 14746, Host: n001, Program: /rotor/scratch/p1783-HALAS/cesm1_1/scripts/test1/cesm.exe

 

MPI: --------stack traceback-------

MPI: Attaching to program: /proc/14746/exe, process 14746

MPI: Try: zypper install -C "debuginfo(build-id)=365e4d2c812908177265c8223f222a1665fe1035"

MPI: (no debugging symbols found)...done.

MPI: Try: zypper install -C "debuginfo(build-id)=8362cd0e37776b4bba3372224858dbcafcadc4ee"

MPI: (no debugging symbols found)...done.

MPI: [Thread debugging using libthread_db enabled]

MPI: Try: zypper install -C "debuginfo(build-id)=a41ac0b0b7cd60bd57473303c2c3de08856d2e06"

MPI: (no debugging symbols found)...done.

MPI: Try: zypper install -C "debuginfo(build-id)=3f06bcfc74f9b01780d68e89b8dce403bef9b2e3"

MPI: (no debugging symbols found)...done.

MPI: Try: zypper install -C "debuginfo(build-id)=d70e9482ac22a826c1cf7d04bdbb1bf06f2e707b"

MPI: (no debugging symbols found)...done.

MPI: Try: zypper install -C "debuginfo(build-id)=17c088070352d83e7afc43d83756b00899fd37f0"

MPI: (no debugging symbols found)...done.

MPI: Try: zypper install -C "debuginfo(build-id)=81a3a96c7c0bc95cb4aa5b29702689cf324a7fcd"

MPI: (no debugging symbols found)...done.

MPI: 0x00002aaaab67e105 in waitpid () from /lib64/libpthread.so.0

MPI: (gdb) #0  0x00002aaaab67e105 in waitpid () from /lib64/libpthread.so.0

MPI: #1  0x00002aaaab3e99a4 in mpi_sgi_system (command=<value optimized out>)

MPI:     at sig.c:89

MPI: #2  MPI_SGI_stacktraceback (command=<value optimized out>) at sig.c:269

MPI: #3  0x00002aaaab37ec42 in print_traceback (ecode=0) at abort.c:168

MPI: #4  0x00002aaaab37ee33 in MPI_SGI_abort () at abort.c:78

MPI: #5  0x00002aaaab3a50a3 in errors_are_fatal (comm=<value optimized out>, 

MPI:     code=<value optimized out>) at errhandler.c:223

MPI: #6  0x00002aaaab3a5401 in MPI_SGI_error (comm=1, code=13) at errhandler.c:60

MPI: #7  0x00002aaaab3b4734 in PMPI_Group_range_incl (group=3, n=1, 

MPI:     ranges=0x1cb9d20, newgroup=0x7fffffff9a34) at group_range_incl.c:58

MPI: #8  0x00002aaaab3b4795 in pmpi_group_range_incl__ ()

MPI:    from /opt/sgi/mpt/mpt-2.02/lib/libmpi.so

MPI: #9  0x000000000049a5c9 in seq_comm_mct_mp_seq_comm_setcomm_ ()

MPI: #10 0x000000000049f8a0 in seq_comm_mct_mp_seq_comm_init_ ()

MPI: #11 0x0000000000433141 in ccsm_comp_mod_mp_ccsm_pre_init_ ()

MPI: #12 0x0000000000434862 in MAIN__ ()

MPI: #13 0x0000000000009fe0 in ?? ()

MPI: #14 0x0000000000000000 in ?? ()

MPI: (gdb) A debugging session is active.

MPI: 

MPI:Inferior 1 [process 14746] will be detached.

MPI: 

MPI: Quit anyway? (y or n) [answered Y; input not from terminal]

MPI: Detaching from program: /proc/14746/exe, process 14746

 

MPI: -----stack traceback ends-----

MPI: MPI_COMM_WORLD rank 0 has terminated without calling MPI_Finalize()

MPI: aborting job

/opt/sgi/mpt/mpt-2.02/bin/mpiexec_mpt: line 53: 14730 Killed                  $mpicmdline_prefix -f $paramfile

 

AER_GM

jedwards

MPI: Module xpmem is not loaded.  MPI memory mapping features (single copy

     transfers, one-sided communication and SHMEM functions) are disabled.

     XPMEM must be enabled on your system to use these features.

 

It seems that the problem is in the mpi library - is it as simple as module load xpmem?  I don't know, talk to your system support staff.

CESM Software Engineer

Log in or register to post comments

Who's new

  • 1658093099@...
  • mborreggine@...
  • kabirtam@...
  • suns@...
  • liangpeng0405@...