Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

CESM 1.0.5 porting runtime issues

I am porting to an Intel Linux cluster (similar to pleiades_wes), starting with compset X. Compiles OK, but getting the following runtime error:

MPI: MPI_COMM_WORLD rank 3 has terminated without calling MPI_Finalize()
MPI: aborting job
MPI: Received signal 11

I am not sure where to turn next.

Complete ccsm.log file included below. Thank you.

George M.

(seq_io_init) pio init parameters: before nml read
(seq_io_init) pio_stride = -99
(seq_io_init) pio_root = -99
(seq_io_init) pio_typename = nothing
(seq_io_init) pio_numtasks = -99
(seq_io_init) pio_debug_level = 0
pio_async_interface = F
(seq_io_init) pio init parameters: after nml read
(seq_io_init) pio_stride = -1
(seq_io_init) pio_root = 1
(seq_io_init) pio_typename = netcdf
(seq_io_init) pio_numtasks = -1
(seq_io_init) pio init parameters:
(seq_io_init) pio_stride = 4
(seq_io_init) pio_root = 1
(seq_io_init) pio_typename = NETCDF
(seq_io_init) pio_numtasks = 1
(seq_io_init) pio_debug_level = 0
pio_async_interface = F
(seq_comm_setcomm) initialize ID ( 7 GLOBAL ) pelist = 0 3 1 ( npes = 4) ( nthreads = 1)
(seq_comm_setcomm) initialize ID ( 2 ATM ) pelist = 0 3 1 ( npes = 4) ( nthreads = 1)
(seq_comm_setcomm) initialize ID ( 1 LND ) pelist = 0 3 1 ( npes = 4) ( nthreads = 1)
(seq_comm_setcomm) initialize ID ( 4 ICE ) pelist = 0 3 1 ( npes = 4) ( nthreads = 1)
(seq_comm_setcomm) initialize ID ( 5 GLC ) pelist = 0 3 1 ( npes = 4) ( nthreads = 1)
(seq_comm_setcomm) initialize ID ( 3 OCN ) pelist = 0 3 1 ( npes = 4) ( nthreads = 1)
(seq_comm_setcomm) initialize ID ( 6 CPL ) pelist = 0 3 1 ( npes = 4) ( nthreads = 1)
(seq_comm_joincomm) initialize ID ( 8 CPLATM ) join IDs = 6 2 ( npes = 4) ( nthreads = 1)
(seq_comm_joincomm) initialize ID ( 9 CPLLND ) join IDs = 6 1 ( npes = 4) ( nthreads = 1)
(seq_comm_joincomm) initialize ID ( 10 CPLICE ) join IDs = 6 4 ( npes = 4) ( nthreads = 1)
(seq_comm_joincomm) initialize ID ( 11 CPLOCN ) join IDs = 6 3 ( npes = 4) ( nthreads = 1)
(seq_comm_joincomm) initialize ID ( 12 CPLGLC ) join IDs = 6 5 ( npes = 4) ( nthreads = 1)

(seq_comm_printcomms) ID layout : global pes vs local pe for each ID
gpe LND ATM OCN ICE GLC CPL GLOBAL CPLATM CPLLND CPLICE CPLOCN CPLGLC nthrds
--- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------
0 : 0 0 0 0 0 0 0 0 0 0 0 0 1
1 : 1 1 1 1 1 1 1 1 1 1 1 1 1
(seq_io_init) pio init parameters for CPL:
(seq_io_init) pio_stride = 4
(seq_io_init) pio_root = 1
pio iotype is netcdf
(seq_io_init) pio_iotype = 6
(seq_io_init) pio_numtasks = 1
2 : 2 2 2 2 2 2 2 2 2 2 2 2 1
3 : 3 3 3 3 3 3 3 3 3 3 3 3 1

(seq_io_init) pio init parameters for ATM:
(seq_io_init) pio_stride = 4
(seq_io_init) pio_root = 1
pio iotype is netcdf
(seq_io_init) pio_iotype = 6
(seq_io_init) pio_numtasks = 1
(seq_io_init) pio init parameters for ICE:
(seq_io_init) pio_stride = 4
(seq_io_init) pio_root = 1
pio iotype is netcdf
(seq_io_init) pio_iotype = 6
(seq_io_init) pio_numtasks = 1
(seq_io_init) pio init parameters for OCN:
(seq_io_init) pio_stride = 4
(seq_io_init) pio_root = 0
pio iotype is netcdf
(seq_io_init) pio_iotype = 6
(seq_io_init) pio_numtasks = 1
(seq_io_init) pio init parameters for LND:
(seq_io_init) pio_stride = 4
(seq_io_init) pio_root = 1
pio iotype is netcdf
(seq_io_init) pio_iotype = 6
(seq_io_init) pio_numtasks = 1
(seq_io_init) pio init parameters for GLC:
(seq_io_init) pio_stride = 4
(seq_io_init) pio_root = 1
pio iotype is netcdf
(seq_io_init) pio_iotype = 6
(seq_io_init) pio_numtasks = 1
(t_initf) Read in prof_inparm namelist from: drv_in
8 MB memory alloc in MB is 8.00
8 MB memory dealloc in MB is 0.00
Memory block size conversion in bytes is 4096.00
8 MB memory alloc in MB is 8.00
8 MB memory dealloc in MB is 0.00
Memory block size conversion in bytes is 4096.00
8 MB memory alloc in MB is 8.00
8 MB memory dealloc in MB is 0.00
Memory block size conversion in bytes is 4096.00
dead_setNewGrid decomp 2d 1 3456 1 144
25 48
dead_setNewGrid decomp 2d 2 3456 1 144
49 72
dead_setNewGrid decomp 2d 3 3456 1 144
73 96
dead_setNewGrid decomp seg 1 3458 266
dead_setNewGrid decomp seg 2 3458 266
dead_setNewGrid decomp seg 3 3450 266
dead_setNewGrid decomp seg 1 64805 4985
dead_setNewGrid decomp seg 2 64805 4985
dead_setNewGrid decomp seg 3 64785 4985
dead_setNewGrid decomp seg 1 3458 266
dead_setNewGrid decomp seg 2 3458 266
dead_setNewGrid decomp seg 3 3450 266
dead_setNewGrid decomp 2d 1 30720 161 320
1 192
dead_setNewGrid decomp 2d 2 30720 1 160
193 384
dead_setNewGrid decomp 2d 3 30720 161 320
193 384
dead_setNewGrid decomp 2d 1 30720 81 160
1 384
dead_setNewGrid decomp 2d 2 30720 161 240
1 384
dead_setNewGrid decomp 2d 3 30720 241 320
1 384
dead_setNewGrid decomp 2d 1 3456 37 72
1 96
dead_setNewGrid decomp 2d 2 3456 73 108
1 96
dead_setNewGrid decomp 2d 3 3456 109 144
1 96
MPI: MPI_COMM_WORLD rank 3 has terminated without calling MPI_Finalize()
MPI: aborting job
MPI: Received signal 11

 

jedwards

CSEG and Liaisons
Staff member
First please try updating to 1.0.6 we may have already fixed this.   If it's still a problem please let us know what compiler and compiler version.    Did you look in the run directory for a core file?
 
Dear jedwards, Thank you for the reply. I rebuilt an executable using CESM v1.1 (recommended for scientific purposes). Same Intel Linux cluster (similar to pleiades_wes), starting with compset X. Compiles OK (using ifort v11.1.069), but getting the runtime error attached below. No core file produced--at least not in the run directory. Regards,GeorgeMPI: Module xpmem is not loaded.  MPI memory mapping features (single copy     transfers, one-sided communication and SHMEM functions) are disabled.     XPMEM must be enabled on your system to use these features. MPI WARNING: error from MPI_SGI_xp_init_slave/xpmem_make (all_mem), rc = -1, errno = 19             Check that the xpmem module is loaded(seq_comm_setcomm)  initialize ID (  1 GLOBAL          ) pelist   =     0     0     1 ( npes =     1) ( nthreads =  1)MPI Error, rank:0, function:MPI_GROUP_RANGE_INCL, Invalid argumentMPI: Global rank 0 is aborting with error code 0.     Process ID: 14746, Host: n001, Program: /rotor/scratch/p1783-HALAS/cesm1_1/scripts/test1/cesm.exe MPI: --------stack traceback-------MPI: Attaching to program: /proc/14746/exe, process 14746MPI: Try: zypper install -C "debuginfo(build-id)=365e4d2c812908177265c8223f222a1665fe1035"MPI: (no debugging symbols found)...done.MPI: Try: zypper install -C "debuginfo(build-id)=8362cd0e37776b4bba3372224858dbcafcadc4ee"MPI: (no debugging symbols found)...done.MPI: [Thread debugging using libthread_db enabled]MPI: Try: zypper install -C "debuginfo(build-id)=a41ac0b0b7cd60bd57473303c2c3de08856d2e06"MPI: (no debugging symbols found)...done.MPI: Try: zypper install -C "debuginfo(build-id)=3f06bcfc74f9b01780d68e89b8dce403bef9b2e3"MPI: (no debugging symbols found)...done.MPI: Try: zypper install -C "debuginfo(build-id)=d70e9482ac22a826c1cf7d04bdbb1bf06f2e707b"MPI: (no debugging symbols found)...done.MPI: Try: zypper install -C "debuginfo(build-id)=17c088070352d83e7afc43d83756b00899fd37f0"MPI: (no debugging symbols found)...done.MPI: Try: zypper install -C "debuginfo(build-id)=81a3a96c7c0bc95cb4aa5b29702689cf324a7fcd"MPI: (no debugging symbols found)...done.MPI: 0x00002aaaab67e105 in waitpid () from /lib64/libpthread.so.0MPI: (gdb) #0  0x00002aaaab67e105 in waitpid () from /lib64/libpthread.so.0MPI: #1  0x00002aaaab3e99a4 in mpi_sgi_system (command=)MPI:     at sig.c:89MPI: #2  MPI_SGI_stacktraceback (command=) at sig.c:269MPI: #3  0x00002aaaab37ec42 in print_traceback (ecode=0) at abort.c:168MPI: #4  0x00002aaaab37ee33 in MPI_SGI_abort () at abort.c:78MPI: #5  0x00002aaaab3a50a3 in errors_are_fatal (comm=, MPI:     code=) at errhandler.c:223MPI: #6  0x00002aaaab3a5401 in MPI_SGI_error (comm=1, code=13) at errhandler.c:60MPI: #7  0x00002aaaab3b4734 in PMPI_Group_range_incl (group=3, n=1, MPI:     ranges=0x1cb9d20, newgroup=0x7fffffff9a34) at group_range_incl.c:58MPI: #8  0x00002aaaab3b4795 in pmpi_group_range_incl__ ()MPI:    from /opt/sgi/mpt/mpt-2.02/lib/libmpi.soMPI: #9  0x000000000049a5c9 in seq_comm_mct_mp_seq_comm_setcomm_ ()MPI: #10 0x000000000049f8a0 in seq_comm_mct_mp_seq_comm_init_ ()MPI: #11 0x0000000000433141 in ccsm_comp_mod_mp_ccsm_pre_init_ ()MPI: #12 0x0000000000434862 in MAIN__ ()MPI: #13 0x0000000000009fe0 in ?? ()MPI: #14 0x0000000000000000 in ?? ()MPI: (gdb) A debugging session is active.MPI: MPI:Inferior 1 [process 14746] will be detached.MPI: MPI: Quit anyway? (y or n) [answered Y; input not from terminal]MPI: Detaching from program: /proc/14746/exe, process 14746 MPI: -----stack traceback ends-----MPI: MPI_COMM_WORLD rank 0 has terminated without calling MPI_Finalize()MPI: aborting job/opt/sgi/mpt/mpt-2.02/bin/mpiexec_mpt: line 53: 14730 Killed                  $mpicmdline_prefix -f $paramfile 
 

jedwards

CSEG and Liaisons
Staff member
MPI: Module xpmem is not loaded.  MPI memory mapping features (single copy     transfers, one-sided communication and SHMEM functions) are disabled.     XPMEM must be enabled on your system to use these features. It seems that the problem is in the mpi library - is it as simple as module load xpmem?  I don't know, talk to your system support staff.
 
Top