Ubuntu porting issue, can't not run MPITASK_PER_NODE > 1

Johnny

Johnny Guo
New Member
Hello,
I have done a lot of searching and couldn't find an answer hence decided to post the issue I am facing here:
I ported CSEM 2.1.4 to Ubuntu 22.04.3 LTS.
when I set
<MAX_TASKS_PER_NODE>1</MAX_TASKS_PER_NODE>
<MAX_MPITASKS_PER_NODE>1</MAX_MPITASKS_PER_NODE>
<PROJECT_REQUIRED>FALSE</PROJECT_REQUIRED>
<mpirun mpilib="default">
<executable>mpiexec</executable>
<arguments>
<arg name="ntasks">
-np 1 </arg>
</arguments>
</mpirun>

then run

create_newcase --case /home/jguo/projects/cesm/scratch/testrun --compset QPC4 --res f45_f45_mg37 --run-unsupported; cd testrun; ./xmlchange STOP_OPTION=ndays,STOP_N=3; ./case.setup
./case.build --clean-all
./case.build
./case.submit
Everything works fine.

However, once I set the max_tasks keys above to 2, I got this error: ( full log is attached as QPC4.txt and cesm.log is also attached as cesm.log.230917-110446.txt)

Invalid PIO rearranger comm max pend req (comp2io), 0
Resetting PIO rearranger comm max pend req (comp2io) to 64
PIO rearranger options:
comm type =p2p
comm fcd =2denable
max pend req (comp2io) = 0
enable_hs (comp2io) = T
enable_isend (comp2io) = F
max pend req (io2comp) = 64
enable_hs (io2comp) = F
enable_isend (io2comp) = T
(seq_comm_setcomm) init ID ( 1 GLOBAL ) pelist = 0 0 1 ( npes = 1) ( nthreads = 1)( suffix =)
[Johnny:818539] *** An error occurred in MPI_Group_range_incl
[Johnny:818539] *** reported by process [4086169601,0]
[Johnny:818539] *** on communicator MPI_COMM_WORLD
[Johnny:818539] *** MPI_ERR_RANK: invalid rank
[Johnny:818539] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[Johnny:818539] *** and potentially your MPI job)


Also when I run --compset I2000Clm50SpGs --res f09_g17, even if I set -np 1 it still gives an error similar to this. logs are in I2000Clm50SpGs.txt and cesm.log.230917-111344.txt
I also attached my pnetcdf-config, nc-config --all ouptput, config_machines.xml and config_compilers.xml
When I compiled hdf5, I did enable parallel
CC=mpicc CFLAGS=-w ./configure --prefix=/home/jguo/CESM/Library --with-zlib --enable-hl --enable-fortran --enable-parallel

I have been stuck for a few days. Any help is greatly appreciated!
 

Attachments

Johnny

Johnny Guo
New Member
I finally figured out! I had an old mpich directory in my path.
Once I removed that and then added the following env variable in my environment.
Then everything worked!


export MPI_HOME=/home/jguo/CESM/Library/.local/openmpi
export PATH=${MPI_HOME}/bin:$PATH
export LD_LIBRARY_PATH=${MPI_HOME}/lib:$LD_LIBRARY_PATH
export MANPATH=${MPI_HOME}/share/man:$MANPATH
export MPICC=mpicc
export MPICXX=mpicxx
export MPIFC=mpif90

export OMPI_ALLOW_RUN_AS_ROOT=1
export OMPI_ALLOW_RUN_AS_ROOT_CONFIRM=1

export HDF5=/home/jguo/CESM/Library
export LD_LIBRARY_PATH=/home/jguo/CESM/Library/lib:$LD_LIBRARY_PATH
export CPPFLAGS="-I/home/jguo/CESM/Library/include -I/home/jguo/CESM/Library/.local/openmpi/include"
export LDFLAGS="-L/home/jguo/CESM/Library/lib -llapack -lblas"
export INCLUDE=/home/jguo/CESM/Library/include:$Include
export HDF5_LIB_DIR=/home/jguo/CESM/Library/lib
export HDF5DIR=/home/jguo/CESM/Library/
export HDF5_DIR=/home/jguo/CESM/Library/
export NETCDF=/home/jguo/CESM/Library

export PATH=/home/jguo/CESM/Library/bin:$PATH
export NETCDF_PATH=/home/jguo/CESM/Library
export NETCDF_C_PATH=/home/jguo/CESM/Library
export NETCDF_FORTRAN_PATH=/home/jguo/CESM/Library
export NETCDF_FORTRAN=/home/jguo/CESM/Library
export NETCDF_FORTRAN_LIBRARY=/home/jguo/CESM/Library/lib
export NETCDF_FORTRAN_INCLUDE_DIR=/home/jguo/CESM/Library/include
export LIB_NETCDF=/home/jguo/CESM/Library/lib
export LIB_NETCDF_C=/home/jguo/CESM/Library/lib
export LIB_NETCDF_FORTRAN=/home/jguo/CESM/Library/lib
export NetCDF_C_PATH=/home/jguo/CESM/Library/
export NetCDF_Fortran_PATH=/home/jguo/CESM/Library
export NetCDF=/home/jguo/CESM/Library
export PnetCDF=/home/jguo/CESM/Library
export PNETCDF_PATH=/home/jguo/CESM/Library
export PNetCDF_PATH=/home/jguo/CESM/Library
export NetCDF_Fortran_LIBRARY=/home/jguo/CESM/Library/lib
export NetCDF_Fortran_INCLUDE_DIR=/home/jguo/CESM/Library/include
export LDFLAGS="-L/home/jguo/CESM/Library/lib"
export LIBS="-lnetcdf -lhdf5_hl -lhdf5 -lz"

export LD_LIBRARY_PATH=/home/jguo/CESM/Library/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/x86_64-linux-gnu/
 
Back
Top