gchiodo@fis_ucm_es
Member
Dear all,
we recently managed to compile CESM on a generic_IBM cluster called MareNostrum (Barcelona Supercomputing Center). To build the model successfully, I followed your advise in using MPI-2 libraries. Unfortunately, we are still encountering problems to get cesm-1.0.3 running. This time we get a run-time error.
We use the XLF compiler, and the 64-bit version of the 1.3.1..9 MPICH2 module. To compile the code correctly, we deleted the "disable-mpi2" flag from the PIO_CONFIG_OPTS, and commented out the L112-114 (flag option SLIBS += -L$(LIB_MPI) -lmpi --> this flag is already loaded by the mpif90 wrapper) of the Makefile in the $CASEROOT/Tools folder.
Unfortunately, the binary file cannot be executed on the MareNostrum machine due to some MPI-related errors. More specifically, the run script fails in the "mpiexec" command. After a few lines in the ccsm.log file, the execution fails with the following error
(seq_comm_setcomm) initialize ID ( 7 GLOBAL ) pelist = 0 63 1 ( npes = 64) ( nthreads = 1)
(seq_comm_setcomm) initialize ID ( 2 ATM ) pelist = 0 63 1 ( npes = 64) ( nthreads = 1)
(seq_comm_setcomm) initialize ID ( 1 LND ) pelist = 0 63 1 ( npes = 64) ( nthreads = 1)
(seq_comm_setcomm) initialize ID ( 4 ICE ) pelist = 0 63 1 ( npes = 64) ( nthreads = 1)
(seq_comm_setcomm) initialize ID ( 5 GLC ) pelist = 0 63 1 ( npes = 64) ( nthreads = 1)
(seq_comm_setcomm) initialize ID ( 3 OCN ) pelist = 0 63 1 ( npes = 64) ( nthreads = 1)
(seq_comm_setcomm) initialize ID ( 6 CPL ) pelist = 0 63 1 ( npes = 64) ( nthreads = 1)
(seq_comm_joincomm) initialize ID ( 8 CPLATM ) join IDs = 6 2 ( npes = 64) ( nthreads = 1)
(seq_comm_joincomm) initialize ID ( 9 CPLLND ) join IDs = 6 1 ( npes = 64) ( nthreads = 1)
(seq_comm_joincomm) initialize ID ( 10 CPLICE ) join IDs = 6 4 ( npes = 64) ( nthreads = 1)
(seq_comm_joincomm) initialize ID ( 11 CPLOCN ) join IDs = 6 3 ( npes = 64) ( nthreads = 1)
(seq_comm_joincomm) initialize ID ( 12 CPLGLC ) join IDs = 6 5 ( npes = 64) ( nthreads = 1)
[s16c5b01:5068] *** An error occurred in MPI_Gather
[s16c5b01:5068] *** on communicator MPI COMMUNICATOR 5 CREATE FROM 0
[s16c5b01:5068] *** MPI_ERR_TYPE: invalid datatype
[s16c5b01:5068] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
We also tried to compile the code with the brand-new openMPI libraries, but we got a similar error at runtime.
At first sight, one could interprete the error as some inconsistency issue in the MPI libraries. That is, the code is compiled with MPI libraries in some routines, and with MPI-2 in some other parts. However, a look at all respective logs reveals that all modules are compiled correctly with MPICH-2 libraries, and with the same mpif90 wrapper (i.e. the whole path is correct...).
After an exensive revision of the code, the BSC staff told me that there are a few conflictive directives in the mpeu source code. If we understand it correctly, in B1850WCN/mct/mpeu there is some sort of interface which should translate the MP_INTEGER, MP_REAL, ... variables to MPI_INTEGER, MPI_REAL,...
It seems plausible that the error on this particular machine is due to some wrong translation of the above mentioned variables into MPI variables... and that could cause the invalid-data-type runtime error. I am currently running this model on Finisterrae; but the code was compiled with ifort there (not with xlf as in MareNostrum), and that machine is very different.
Did you encounter some similar runtime issue on your machine?
Maybe there is some additional flag we could add to prevent this runtime issue to occur?
The MPI environmental variables are set as follows:
setenv MP_RC_USE_LMC yes
setenv LAPI_DEBUG_RC_WAIT_ON_QP_SETUP yes
setenv MP_INFOLEVEL 2
setenv MP_EUIDEVICE sn_all
setenv MP_SHARED_MEMORY yes
setenv LAPI_USE_SHM yes
setenv MP_EUILIB mx
#setenv MP_EAGER_LIMIT 32k
setenv MP_BULK_MIN_MSG_SIZE 64k
setenv MP_POLLING_INTERVAL 20000000
setenv MEMORY_AFFINITY MCM
setenv LAPI_DEBUG_ENABLE_AFFINITY YES
setenv LAPI_DEBUG_BINDPROC_AFFINITY YES
setenv MP_SYNC_QP YES
setenv MP_RFIFO_SIZE 16777216
setenv MP_SHM_ATTACH_THRESH 500000
setenv MP_EUIDEVELOP min
setenv MP_USE_BULK_XFER yes
setenv MP_BUFFER_MEM 64M
The compilation command in the log files reads as follows:
MCT:
mpif90 -c /gpfs/apps/OPENMPI/1.5.3/XLC/64/include/ -WF,-DSYSLINUX,-DCPRXLF -O2 -qarch=auto -qsuffix=f=F90:cpp=F90 m_stdio.F90
/gpfs/apps/OPENMPI/1.5.3/XLC/64/bin/mpif90 -q64 -c /gpfs/apps/OPENMPI/1.5.3/XLC/64/include/ -WF,-DSYSLINUX,-DCPRXLF -O2 -qarch=auto -qsuffix=f=F90:cpp=F90 m_stdio.F90
PIO:
/gpfs/apps/OPENMPI/1.5.3/XLC/64/bin/mpif90 -q64 -c -WF,-DMCT_INTERFACE -WF,-DHAVE_MPI -WF,-DCO2A -WF,-DAIX -WF,-DSEQ_ -WF,-DFORTRAN_SAME -O3 -qstrict -qarch=ppc970 -qtune=ppc970 -qcache=auto -q64 -g -O2 -qstrict -Q -qinitauto -WF,-DSYSLINUX,-DLINUX,-DCPRXLF -WF,-DSPMD,-DHAVE_MPI,-DUSEMPIIO,-D_NETCDF,-D_NOPNETCDF,-D_NOUSEMCT,-D_USEBOX,-DPIO_GPFS_HINTS -I/gpfs/apps/NETCDF/64/include pio_kinds.F90
CCSM:
/gpfs/apps/OPENMPI/1.5.3/XLC/64/bin/mpif90 -q64 -c -I. -I/usr/include -I/gpfs/apps/NETCDF/64/include -I/gpfs/apps/NETCDF/64/include -I/gpfs/apps/OPENMPI/1.5.3/XLC/64/include -I. -I/gpfs/projects/ucm18/ucm18119/B1850WCN/SourceMods/src.drv -I/home/ucm18/ucm18119/code/cesm1_0_3/models/drv/driver -I/gpfs/projects/ucm18/ucm18119/B1850WCN/lib/include -WF,-DMCT_INTERFACE -WF,-DHAVE_MPI -WF,-DCO2A -WF,-DAIX -WF,-DSEQ_ -WF,-DFORTRAN_SAME -O3 -qstrict -qarch=ppc970 -qtune=ppc970 -qcache=auto -q64 -g -O2 -qstrict -Q -qinitauto -qsuffix=f=f90:cpp=F90 -qfree=f90 /home/ucm18/ucm18119/code/cesm1_0_3/models/drv/driver/ccsm_driver.F90
/gpfs/apps/OPENMPI/1.5.3/XLC/64/bin/mpif90 -q64 -o /gpfs/projects/ucm18/ucm18119/B1850WCN/run/ccsm.exe ccsm_comp_mod.o ccsm_driver.o map_atmatm_mct.o map_atmice_mct.o map_atmlnd_mct.o map_atmocn_mct.o map_glcglc_mct.o map_iceice_mct.o map_iceocn_mct.o map_lndlnd_mct.o map_ocnocn_mct.o map_rofocn_mct.o map_rofrof_mct.o map_snoglc_mct.o map_snosno_mct.o mrg_x2a_mct.o mrg_x2g_mct.o mrg_x2i_mct.o mrg_x2l_mct.o mrg_x2o_mct.o mrg_x2s_mct.o seq_avdata_mod.o seq_diag_mct.o seq_domain_mct.o seq_flux_mct.o seq_frac_mct.o seq_hist_mod.o seq_rearr_mod.o seq_rest_mod.o -L/gpfs/projects/ucm18/ucm18119/B1850WCN/lib -latm -llnd -lice -locn -lglc -L/gpfs/projects/ucm18/ucm18119/B1850WCN/lib -lcsm_share -lmct -lmpeu -lpio -L/opt/ibmcmp/xlmass/5.0/lib64/ -lmassvp6_64 -L/gpfs/apps/NETCDF/64/lib -lnetcdf
We run the executable with the following command:
/gpfs/apps/OPENMPI/1.5.3/bin/mpiexec ./ccsm.exe >&! ccsm.log.$LID
Should we change some flag, and compile the model again? If yes, how?
Thank you!
we recently managed to compile CESM on a generic_IBM cluster called MareNostrum (Barcelona Supercomputing Center). To build the model successfully, I followed your advise in using MPI-2 libraries. Unfortunately, we are still encountering problems to get cesm-1.0.3 running. This time we get a run-time error.
We use the XLF compiler, and the 64-bit version of the 1.3.1..9 MPICH2 module. To compile the code correctly, we deleted the "disable-mpi2" flag from the PIO_CONFIG_OPTS, and commented out the L112-114 (flag option SLIBS += -L$(LIB_MPI) -lmpi --> this flag is already loaded by the mpif90 wrapper) of the Makefile in the $CASEROOT/Tools folder.
Unfortunately, the binary file cannot be executed on the MareNostrum machine due to some MPI-related errors. More specifically, the run script fails in the "mpiexec" command. After a few lines in the ccsm.log file, the execution fails with the following error
(seq_comm_setcomm) initialize ID ( 7 GLOBAL ) pelist = 0 63 1 ( npes = 64) ( nthreads = 1)
(seq_comm_setcomm) initialize ID ( 2 ATM ) pelist = 0 63 1 ( npes = 64) ( nthreads = 1)
(seq_comm_setcomm) initialize ID ( 1 LND ) pelist = 0 63 1 ( npes = 64) ( nthreads = 1)
(seq_comm_setcomm) initialize ID ( 4 ICE ) pelist = 0 63 1 ( npes = 64) ( nthreads = 1)
(seq_comm_setcomm) initialize ID ( 5 GLC ) pelist = 0 63 1 ( npes = 64) ( nthreads = 1)
(seq_comm_setcomm) initialize ID ( 3 OCN ) pelist = 0 63 1 ( npes = 64) ( nthreads = 1)
(seq_comm_setcomm) initialize ID ( 6 CPL ) pelist = 0 63 1 ( npes = 64) ( nthreads = 1)
(seq_comm_joincomm) initialize ID ( 8 CPLATM ) join IDs = 6 2 ( npes = 64) ( nthreads = 1)
(seq_comm_joincomm) initialize ID ( 9 CPLLND ) join IDs = 6 1 ( npes = 64) ( nthreads = 1)
(seq_comm_joincomm) initialize ID ( 10 CPLICE ) join IDs = 6 4 ( npes = 64) ( nthreads = 1)
(seq_comm_joincomm) initialize ID ( 11 CPLOCN ) join IDs = 6 3 ( npes = 64) ( nthreads = 1)
(seq_comm_joincomm) initialize ID ( 12 CPLGLC ) join IDs = 6 5 ( npes = 64) ( nthreads = 1)
[s16c5b01:5068] *** An error occurred in MPI_Gather
[s16c5b01:5068] *** on communicator MPI COMMUNICATOR 5 CREATE FROM 0
[s16c5b01:5068] *** MPI_ERR_TYPE: invalid datatype
[s16c5b01:5068] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
We also tried to compile the code with the brand-new openMPI libraries, but we got a similar error at runtime.
At first sight, one could interprete the error as some inconsistency issue in the MPI libraries. That is, the code is compiled with MPI libraries in some routines, and with MPI-2 in some other parts. However, a look at all respective logs reveals that all modules are compiled correctly with MPICH-2 libraries, and with the same mpif90 wrapper (i.e. the whole path is correct...).
After an exensive revision of the code, the BSC staff told me that there are a few conflictive directives in the mpeu source code. If we understand it correctly, in B1850WCN/mct/mpeu there is some sort of interface which should translate the MP_INTEGER, MP_REAL, ... variables to MPI_INTEGER, MPI_REAL,...
It seems plausible that the error on this particular machine is due to some wrong translation of the above mentioned variables into MPI variables... and that could cause the invalid-data-type runtime error. I am currently running this model on Finisterrae; but the code was compiled with ifort there (not with xlf as in MareNostrum), and that machine is very different.
Did you encounter some similar runtime issue on your machine?
Maybe there is some additional flag we could add to prevent this runtime issue to occur?
The MPI environmental variables are set as follows:
setenv MP_RC_USE_LMC yes
setenv LAPI_DEBUG_RC_WAIT_ON_QP_SETUP yes
setenv MP_INFOLEVEL 2
setenv MP_EUIDEVICE sn_all
setenv MP_SHARED_MEMORY yes
setenv LAPI_USE_SHM yes
setenv MP_EUILIB mx
#setenv MP_EAGER_LIMIT 32k
setenv MP_BULK_MIN_MSG_SIZE 64k
setenv MP_POLLING_INTERVAL 20000000
setenv MEMORY_AFFINITY MCM
setenv LAPI_DEBUG_ENABLE_AFFINITY YES
setenv LAPI_DEBUG_BINDPROC_AFFINITY YES
setenv MP_SYNC_QP YES
setenv MP_RFIFO_SIZE 16777216
setenv MP_SHM_ATTACH_THRESH 500000
setenv MP_EUIDEVELOP min
setenv MP_USE_BULK_XFER yes
setenv MP_BUFFER_MEM 64M
The compilation command in the log files reads as follows:
MCT:
mpif90 -c /gpfs/apps/OPENMPI/1.5.3/XLC/64/include/ -WF,-DSYSLINUX,-DCPRXLF -O2 -qarch=auto -qsuffix=f=F90:cpp=F90 m_stdio.F90
/gpfs/apps/OPENMPI/1.5.3/XLC/64/bin/mpif90 -q64 -c /gpfs/apps/OPENMPI/1.5.3/XLC/64/include/ -WF,-DSYSLINUX,-DCPRXLF -O2 -qarch=auto -qsuffix=f=F90:cpp=F90 m_stdio.F90
PIO:
/gpfs/apps/OPENMPI/1.5.3/XLC/64/bin/mpif90 -q64 -c -WF,-DMCT_INTERFACE -WF,-DHAVE_MPI -WF,-DCO2A -WF,-DAIX -WF,-DSEQ_ -WF,-DFORTRAN_SAME -O3 -qstrict -qarch=ppc970 -qtune=ppc970 -qcache=auto -q64 -g -O2 -qstrict -Q -qinitauto -WF,-DSYSLINUX,-DLINUX,-DCPRXLF -WF,-DSPMD,-DHAVE_MPI,-DUSEMPIIO,-D_NETCDF,-D_NOPNETCDF,-D_NOUSEMCT,-D_USEBOX,-DPIO_GPFS_HINTS -I/gpfs/apps/NETCDF/64/include pio_kinds.F90
CCSM:
/gpfs/apps/OPENMPI/1.5.3/XLC/64/bin/mpif90 -q64 -c -I. -I/usr/include -I/gpfs/apps/NETCDF/64/include -I/gpfs/apps/NETCDF/64/include -I/gpfs/apps/OPENMPI/1.5.3/XLC/64/include -I. -I/gpfs/projects/ucm18/ucm18119/B1850WCN/SourceMods/src.drv -I/home/ucm18/ucm18119/code/cesm1_0_3/models/drv/driver -I/gpfs/projects/ucm18/ucm18119/B1850WCN/lib/include -WF,-DMCT_INTERFACE -WF,-DHAVE_MPI -WF,-DCO2A -WF,-DAIX -WF,-DSEQ_ -WF,-DFORTRAN_SAME -O3 -qstrict -qarch=ppc970 -qtune=ppc970 -qcache=auto -q64 -g -O2 -qstrict -Q -qinitauto -qsuffix=f=f90:cpp=F90 -qfree=f90 /home/ucm18/ucm18119/code/cesm1_0_3/models/drv/driver/ccsm_driver.F90
/gpfs/apps/OPENMPI/1.5.3/XLC/64/bin/mpif90 -q64 -o /gpfs/projects/ucm18/ucm18119/B1850WCN/run/ccsm.exe ccsm_comp_mod.o ccsm_driver.o map_atmatm_mct.o map_atmice_mct.o map_atmlnd_mct.o map_atmocn_mct.o map_glcglc_mct.o map_iceice_mct.o map_iceocn_mct.o map_lndlnd_mct.o map_ocnocn_mct.o map_rofocn_mct.o map_rofrof_mct.o map_snoglc_mct.o map_snosno_mct.o mrg_x2a_mct.o mrg_x2g_mct.o mrg_x2i_mct.o mrg_x2l_mct.o mrg_x2o_mct.o mrg_x2s_mct.o seq_avdata_mod.o seq_diag_mct.o seq_domain_mct.o seq_flux_mct.o seq_frac_mct.o seq_hist_mod.o seq_rearr_mod.o seq_rest_mod.o -L/gpfs/projects/ucm18/ucm18119/B1850WCN/lib -latm -llnd -lice -locn -lglc -L/gpfs/projects/ucm18/ucm18119/B1850WCN/lib -lcsm_share -lmct -lmpeu -lpio -L/opt/ibmcmp/xlmass/5.0/lib64/ -lmassvp6_64 -L/gpfs/apps/NETCDF/64/lib -lnetcdf
We run the executable with the following command:
/gpfs/apps/OPENMPI/1.5.3/bin/mpiexec ./ccsm.exe >&! ccsm.log.$LID
Should we change some flag, and compile the model again? If yes, how?
Thank you!