Main menu

Navigation

CCSM3 on 32 cpus Quad-Core AMD Opteron

2 posts / 0 new
Last post
sang-ki.lee@...
CCSM3 on 32 cpus Quad-Core AMD Opteron

Our lab bought a new linux machine: 32 cpus (speed = 2612.051 MHz) Quad-Core AMD Opteron(tm)
Processor 8382 with total shared memory of 132GB. The motherboard has 8 blades and each
contains 4 cpus. CCSM3 is compiled in the new machine with mpich2(1.1.1) without error
message (ifort and gcc are used to install both ccsm3 and mpi libraries).

CCSM3 run is executed interacticvely (no batch system) for T31_gx3v5 compset B:
/usr/local/mpich2/bin/mpirun -np 4 cpl : -np 2 csim : -np 4 clm : -np 4 pop : -np 8 cam

I get the following error messages shortly after the model run is executed

Fatal error in MPI_Comm_dup: Invalid communicator, error stack:
MPI_Comm_dup(167): MPI_Comm_dup("comm=0x0", new_comm=0x9f7758) failed
MPI_Comm_dup(95).: Invalid communicator
Fatal error in MPI_Comm_dup: Invalid communicator, error stack:
MPI_Comm_dup(167): MPI_Comm_dup(comm=0x0, new_comm=0x357ec98) failed
MPI_Comm_dup(95).: Invalid communicator
......
rank 21 in job 17 phodmod.aoml.noaa.gov_51674 caused collective abort of all ranks

My Macros.Linux looks like this:

INCLDIR := -I$(INCROOT) -I/usr/local/include -I/usr/local/mpich2/include
SLIBS := -L/usr/local/lib -lnetcdf -L/usr/local/mpich2/lib -lmpich
ULIBS := -L$(LIBROOT) -lesmf -lmct -lmpeu -lmph
CPP := NONE
CPPFLAGS := -DLINUX -DPGF90 -DNO_SHR_VMATH
CPPDEFS := -DLINUX
CC := /usr/local/mpich2/bin/mpicc
CFLAGS := -c
FIXEDFLAGS :=
FREEFLAGS := -FR
FC := /usr/local/mpich2/bin/mpif90
FFLAGS := -c -r8 -i4 -extend_source -assume byterecl
MOD_SUFFIX := mod
LD := $(FC)
LDFLAGS := -L/usr/lib64 -lrdmacm -libverbs -libumad -lpthread
AR := ar
ifeq ($(MODEL),pop)
CPPDEFS := $(CPPDEFS) -DPOSIX -Dimpvmix -Dcoupled -DNPROC_X=$(NX) -DNPROC_Y=$(NY)
FIXEDFLAGS := -convert big_endian
endif
ifeq ($(MODEL),csim)
CPPDEFS := $(CPPDEFS) -Dcoupled -DNPROC_X=$(NX) -DNPROC_Y=$(NY) -D_MPI
FIXEDFLAGS := -convert big_endian
endif
ifeq ($(THREAD),TRUE)
CPPDEFS := $(CPPDEFS) -D_OPENMP -DTHREADED_OMP
FREEFLAGS := $(FREEFLAGS) -mp
LDFLAGS := $(LDFLAGS) -mp
endif
ifeq ($(DEBUG),TRUE)
endif

I also compiled CCSM3 with openmpi(1.3.3)
/usr/local/bin/mpirun -np 4 cpl : -np 2 csim : -np 4 clm : -np 4 pop : -np 8 cam

My error message looks like thsi:

(main) -------------------------------------------------------------------------
(main) contract init: establish domain & router for lnd
(main) -------------------------------------------------------------------------
(cpl_contract_init) cpl-recv-lnd
[phodmod:04785] *** Process received signal ***
[phodmod:04785] Signal: Segmentation fault (11)
[phodmod:04785] Signal code: Address not mapped (1)
[phodmod:04785] Failing at address: 0x1a254eb9b0
.....
[phodmod:04783] [ 0] /lib64/libpthread.so.0 [0x3905a0e4c0]
[phodmod:04783] [ 1] clm(decompmod_mp_initdecomp_+0x2257) [0x4f50c7]
[phodmod:04783] [ 2] clm(initializemod_mp_initialize_+0x342) [0x536f82]
[phodmod:04783] [ 3] clm(MAIN__+0x8b) [0x58539b]
[phodmod:04783] [ 4] clm(main+0x3c) [0x423ddc]
[phodmod:04783] [ 5] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3904e1d974]
[phodmod:04783] [ 6] clm [0x423ce9]
[phodmod:04783] *** End of error message ***
--------------------------------------------------------------------------
mpiexec noticed that process rank 9 with PID 4785 on node phodmod.aoml.noaa.gov exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

Any help would be appreciated.

Thank you.

Sang-Ki

Dr. Sang-Ki Lee
NOAA/AOML
4301 Rickenbacker Causeway Miami,
FL 33149 USA
Tel) 305-361-4521
Fax) 305-361-4412
E-mail) eval(unescape('%64%6f%63%75%6d%65%6e%74%2e%77%72%69%74%65%28%27%3c%61%20%68%72%65%66%3d%22%6d%61%69%6c%74%6f%3a%73%61%6e%67%2d%6b%69%2e%6c%65%65%40%6e%6f%61%61%2e%67%6f%76%22%3e%73%61%6e%67%2d%6b%69%2e%6c%65%65%40%6e%6f%61%61%2e%67%6f%76%3c%2f%61%3e%27%29%3b'))

There are a total of 8 sockets (not blades), each containing
a Quad-core chip for a total of 32 cores. Blades infer a cluster
and that is not the case here.

Dr. Sang-Ki Lee
NOAA/AOML
4301 Rickenbacker Causeway Miami,
FL 33149 USA
Tel) 305-361-4521
Fax) 305-361-4412
E-mail) eval(unescape('%64%6f%63%75%6d%65%6e%74%2e%77%72%69%74%65%28%27%3c%61%20%68%72%65%66%3d%22%6d%61%69%6c%74%6f%3a%73%61%6e%67%2d%6b%69%2e%6c%65%65%40%6e%6f%61%61%2e%67%6f%76%22%3e%73%61%6e%67%2d%6b%69%2e%6c%65%65%40%6e%6f%61%61%2e%67%6f%76%3c%2f%61%3e%27%29%3b'))

Log in or register to post comments

Who's new

  • federico
  • shreya.dhame@...
  • nooned@...
  • rjallen@...
  • sunjzh13@...