Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

issues linking to the MPI library

I am trying to compile CAM3.0 (patched version) on a linux cluster, however the MPI library is not linking properly. When I attempt to compile the model, I get the following error messages (I get more than what I have pasted below, I cut most of them out to save space):

------------------------------------------------------------
: multiple definition of `MAIN_'
cam.o(.text+0x10): first defined here
/usr/bin/ld: Warning: size of symbol `MAIN_' changed from 5339 to 49 in
test_mpi.o
STATICEcosysDynMod.o(.text+0x3a62): In function
`staticecosysdynmod_readmonthlyvegetation_':
: undefined reference to `mpi_bcast_'
STATICEcosysDynMod.o(.text+0x3ac5): In function
`staticecosysdynmod_readmonthlyvegetation_':
: undefined reference to `mpi_bcast_'
STATICEcosysDynMod.o(.text+0x3b28): In function
`staticecosysdynmod_readmonthlyvegetation_':
: undefined reference to `mpi_bcast_'

surfFileMod.o(.text+0x3d00): In function `surffilemod_surfrd_':
: undefined reference to `mpi_bcast_'
surfFileMod.o(.text+0x3d35): In function `surffilemod_surfrd_':
: undefined reference to `mpi_bcast_'
surfFileMod.o(.text+0x3d6a): In function `surffilemod_surfrd_':
: undefined reference to `mpi_bcast_'
surfFileMod.o(.text+0x3da4): In function `surffilemod_surfrd_':
: undefined reference to `mpi_bcast_'
surfFileMod.o(.text+0x3dde): In function `surffilemod_surfrd_':
: undefined reference to `mpi_bcast_'
surfFileMod.o(.text+0x3e13): more undefined references to `mpi_bcast_'
follow
swap_comm.o(.text+0x35): In function `swap_comm_swap_comm_init_':
rrier_':

wrap_mpi.o(.text+0x1fd9): In function `mpiallgatherint_':
: undefined reference to `mpi_allgather_'
gmake: *** [/home/ccm33/cam] Error 2
---------------------------------------------------------------

Any thoughts?

Cathy
 

jmccaa

New Member
We'll need more information to help:
What version of mpi are you using?
What compilers (c and fortran, exact versions) are you using?
What configure command did you issue to configure the model?

If you've modified the configure script or your Makefile, please describe those changes as well.

Jim
 
Here is all the cluster info for you:

pgf77 5.2-2
pgcc 5.2-2

Linux bulldoga.wss.yale.edu 2.4.20-31.9smp #1 SMP Tue Apr 13 17:40:10 EDT 2004 i686 i686 i386 GNU/Linux

Dual cpus of the following type:
vendor_id : GenuineIntel
cpu family : 15
model : 2
model name : Intel(R) Xeon(TM) CPU 2.60GHz
stepping : 9
cpu MHz : 2599.961
cache size : 512 KB
physical id : 0
siblings : 1
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips : 5190.45

80 IBM Blade Server HS20 nodes
2GB RAM per node
Networking: GB Ethernet

MPICH mpich-1.2.5-ch_p4
-----------------------------------------------------------------------------------------

Also, I did not change the Makefile, and the configure command I used was:

CAMROOT/models/atm/cam/bld/configure -i

This gave me the following error:

testing MPI library... **** FAILED ****
Issued the command:
gmake test_mpi 2>&1

The output was:
cat: Srcfiles: No such file or directory
Makefile:1030: /home/ccm33/Depends: No such file or directory
/home/ccm33/cam1/models/atm/cam/bld/mkSrcfiles > /home/ccm33/Srcfiles
/home/ccm33/cam1/models/atm/cam/bld/mkDepends Filepath Srcfiles > /home/ccm33/De
pends
pgf90 -c -I. -I/home/ccm33/cam1/models/atm/cam/src/physics/cam1 -I/home/ccm33/ca
m1/models/atm/cam/src/dynamics/eul -I/home/ccm33/cam1/models/atm/cam/src/control
-I/home/ccm33/cam1/models/csm_share/shr -I/home/ccm33/cam1/models/atm/cam/src/u
tils -I/home/ccm33/cam1/models/utils/timing -I/home/ccm33/cam1/models/atm/cam/sr
c/advection/slt -I/home/ccm33/cam1/models/atm/cam/src/ocnsice/dom -I/home/ccm33/
cam1/models/lnd/clm2/src/main -I/home/ccm33/cam1/models/lnd/clm2/src/biogeophys
-I/home/ccm33/cam1/models/lnd/clm2/src/biogeochem -I/home/ccm33/cam1/models/lnd/
clm2/src/mksrfdata -I/home/ccm33/cam1/models/lnd/clm2/src/riverroute -I/home/ccm
33/cam1/models/ice/csim4 -I/home/ccm33/netcdf-3.5.1/include -I/usr/local/cluster
/mpi/include -I/home/ccm33/esmf/mod/modO/linux_pgi -I/home/ccm33/netcdf-3.5.1/l
ib -r8 -i4 -DCAM -DNO_SHR_VMATH -DHIDE_SHR_MSG -DLINUX -Mdalign -Mextend -DPGF9
0 -byteswapio -O2 test_mpi.F
pgf90 -o test_mpi test_mpi.o -L/usr/local/cluster/mpi/lib -lmpich
test_mpi.o(.text+0x2c): In function `MAIN_':
: undefined reference to `mpi_init_'
gmake: *** [test_mpi] Error 2
--------------------------------------------------------------------------------------------------

Cathy
 
I seem to have gotten past the above error and now the model configures okay but then when I go to compile it (using the gmake command) I get the error: "Can't find include file mpif.h (test_mpi.F :3)". Although the file mpif.h is in the directory specified correctly as the mpi include directory. Any thoughts as to why the compiler can't see this file?

Cathy
 

jmccaa

New Member
Cathy,

Because you used configure interactively, I really can't tell what's going on unless you send a log of your configure session. Is there something you're trying to accomplish that can't be done on the command line?

The most likely cause of the mpif.h error is that the line in your generated Makefile that starts with INC_MPI doesn't actually point to a directory that contains mpif.h. For instance, on my system, i get the following line:
INC_MPI := /usr/local/mpich-1.2.5-pgi-hpf-cc-5.1-3/include
and in that directory is the mpif.h file that I can list as follows:
>> ls /usr/local/mpich-1.2.5-pgi-hpf-cc-5.1-3/include/mpif.h
-rw-r--r-- 1 root root 10009 Apr 13 2004 /usr/local/mpich-1.2.5-pgi-hpf-cc-5.1-3/include/mpif.h

Jim
 
If I issue the following command,

[ccm33@bulldoga ccm33]$ cam1/models/atm/cam/bld/configure -spmd

I get this output:

creating /home/ccm33/Filepath
creating /home/ccm33/params.h
creating /home/ccm33/misc.h
creating /home/ccm33/preproc.h
creating /home/ccm33/Makefile
creating /home/ccm33/config_cache.xml
configure done.
---------------------------------------------------------------------------------------

This seems to work fine. The netcdf and mpi include and lib directories are specified in my Makefile correctly. However after running configure this way, I then run gmake (after doing make clean) and get the huge list of errors like:


-----------------------------------------------------------------------------------------
test_mpi.o(.text+0x10): In function `MAIN_':
: multiple definition of `MAIN_'
cam.o(.text+0x10): first defined here
/usr/bin/ld: Warning: size of symbol `MAIN_' changed from 5339 to 49 in test_mpi.o
STATICEcosysDynMod.o(.text+0x3a62): In function `staticecosysdynmod_readmonthlyvegetation_':
: undefined reference to `mpi_bcast_'
STATICEcosysDynMod.o(.text+0x3ac5): In function `staticecosysdynmod_readmonthlyvegetation_':
: undefined reference to `mpi_bcast_'
STATICEcosysDynMod.o(.text+0x3b28): In function `staticecosysdynmod_readmonthlyvegetation_':
: undefined reference to `mpi_bcast_'
STATICEcosysDynMod.o(.text+0x3b8b): In function `staticecosysdynmod_readmonthlyvegetation_':
: undefined reference to `mpi_bcast_'
---------------------------------------------------------------------------------

However if I configure interactively, setting spmd on this way, I get the error output shown in a previous email above when the test of the MPI library failed. I am not sure why configure works when it is not run interactively.

Either way, the MPI library is not linking correctly, even though it is specified correctly.

Cathy
 

jmccaa

New Member
Cathy,

It looks like your MPI library might have been built with a fortran compiler that is incompatible with the compiler you are using to compile CAM. You can generally check the symbols in your mpi library by doing something like:

nm $LIB_MPI/libmpich.a | grep mpi_bcast_

This should yield:
00000000 T mpi_bcast_

If it doesn't, then:
If it can't find $LIB_MPI/libmpich.a, you haven't specified LIB_MPI correctly, or else mpi is not installed correctly.
If it has a different number of underscores (most likely 2), you'll have to reconfigure mpi and reinstall it.
If it finds the library, but not the symbol, then mpi is not built/installed correctly.

If the symbol appears as above, then your problem is in the CAM configuration.

Jim
 

eaton

CSEG and Liaisons
Cathy,

I don't know what the difference is between the T and W symbols (I read the
nm manpage and didn't understand it), but I also get the W version on a
linux system where cam3 runs in spmd mode. So I don't think that's a
problem.

When configure is run interactively (-i option) then its default behavior
is to run a test to check that the mpi library can be linked to. Without
the -i option that test is not run unless you give configure the -test
option. So issuing the command "configure -spmd -test" will give you the
same result that you got interactively by accepting all defaults except to
turn spmd on.

When you run "configure -spmd" from the commandline and it is successful,
it has checked to make sure that mpif.h and libmpich.a exist in the
directories you've specified (or the default locations), but it has not run
a test to confirm that libmpich.a can be linked to. In that case the
failure is occurring when you type gmake and the command is issued
to build the cam executable.

It appears that there is still a problem with your mpich installation.
Here is the program that configure tries to build to test that you can link
to the mpi library:

program test_mpi
implicit none
#include
integer :: ierr
call mpi_init(ierr)
end program test_mpi

This test needs to work before it will be possible to build cam.

Brian
 
Top