Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Ubuntu porting issue, can't not run MPITASK_PER_NODE > 1

Johnny

Johnny Guo
New Member
Hello,
I have done a lot of searching and couldn't find an answer hence decided to post the issue I am facing here:
I ported CSEM 2.1.4 to Ubuntu 22.04.3 LTS.
when I set
<MAX_TASKS_PER_NODE>1</MAX_TASKS_PER_NODE>
<MAX_MPITASKS_PER_NODE>1</MAX_MPITASKS_PER_NODE>
<PROJECT_REQUIRED>FALSE</PROJECT_REQUIRED>
<mpirun mpilib="default">
<executable>mpiexec</executable>
<arguments>
<arg name="ntasks">
-np 1 </arg>
</arguments>
</mpirun>

then run

create_newcase --case /home/jguo/projects/cesm/scratch/testrun --compset QPC4 --res f45_f45_mg37 --run-unsupported; cd testrun; ./xmlchange STOP_OPTION=ndays,STOP_N=3; ./case.setup
./case.build --clean-all
./case.build
./case.submit
Everything works fine.

However, once I set the max_tasks keys above to 2, I got this error: ( full log is attached as QPC4.txt and cesm.log is also attached as cesm.log.230917-110446.txt)

Invalid PIO rearranger comm max pend req (comp2io), 0
Resetting PIO rearranger comm max pend req (comp2io) to 64
PIO rearranger options:
comm type =p2p
comm fcd =2denable
max pend req (comp2io) = 0
enable_hs (comp2io) = T
enable_isend (comp2io) = F
max pend req (io2comp) = 64
enable_hs (io2comp) = F
enable_isend (io2comp) = T
(seq_comm_setcomm) init ID ( 1 GLOBAL ) pelist = 0 0 1 ( npes = 1) ( nthreads = 1)( suffix =)
[Johnny:818539] *** An error occurred in MPI_Group_range_incl
[Johnny:818539] *** reported by process [4086169601,0]
[Johnny:818539] *** on communicator MPI_COMM_WORLD
[Johnny:818539] *** MPI_ERR_RANK: invalid rank
[Johnny:818539] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[Johnny:818539] *** and potentially your MPI job)


Also when I run --compset I2000Clm50SpGs --res f09_g17, even if I set -np 1 it still gives an error similar to this. logs are in I2000Clm50SpGs.txt and cesm.log.230917-111344.txt
I also attached my pnetcdf-config, nc-config --all ouptput, config_machines.xml and config_compilers.xml
When I compiled hdf5, I did enable parallel
CC=mpicc CFLAGS=-w ./configure --prefix=/home/jguo/CESM/Library --with-zlib --enable-hl --enable-fortran --enable-parallel

I have been stuck for a few days. Any help is greatly appreciated!
 

Attachments

  • cesm.log.230917-110446.txt
    1 KB · Views: 4
  • QPC4.txt
    9.8 KB · Views: 1
  • I2000Clm50SpGs.txt
    9.4 KB · Views: 1
  • cesm.log.230917-111344.txt
    1 KB · Views: 3
  • config_compilers.xml.txt
    3.5 KB · Views: 2
  • config_machines.xml.txt
    5.3 KB · Views: 3
  • nc-config--all.txt
    1.2 KB · Views: 2
  • pnetcdf-config-dump.txt
    1.6 KB · Views: 1

Johnny

Johnny Guo
New Member
I finally figured out! I had an old mpich directory in my path.
Once I removed that and then added the following env variable in my environment.
Then everything worked!


export MPI_HOME=/home/jguo/CESM/Library/.local/openmpi
export PATH=${MPI_HOME}/bin:$PATH
export LD_LIBRARY_PATH=${MPI_HOME}/lib:$LD_LIBRARY_PATH
export MANPATH=${MPI_HOME}/share/man:$MANPATH
export MPICC=mpicc
export MPICXX=mpicxx
export MPIFC=mpif90

export OMPI_ALLOW_RUN_AS_ROOT=1
export OMPI_ALLOW_RUN_AS_ROOT_CONFIRM=1

export HDF5=/home/jguo/CESM/Library
export LD_LIBRARY_PATH=/home/jguo/CESM/Library/lib:$LD_LIBRARY_PATH
export CPPFLAGS="-I/home/jguo/CESM/Library/include -I/home/jguo/CESM/Library/.local/openmpi/include"
export LDFLAGS="-L/home/jguo/CESM/Library/lib -llapack -lblas"
export INCLUDE=/home/jguo/CESM/Library/include:$Include
export HDF5_LIB_DIR=/home/jguo/CESM/Library/lib
export HDF5DIR=/home/jguo/CESM/Library/
export HDF5_DIR=/home/jguo/CESM/Library/
export NETCDF=/home/jguo/CESM/Library

export PATH=/home/jguo/CESM/Library/bin:$PATH
export NETCDF_PATH=/home/jguo/CESM/Library
export NETCDF_C_PATH=/home/jguo/CESM/Library
export NETCDF_FORTRAN_PATH=/home/jguo/CESM/Library
export NETCDF_FORTRAN=/home/jguo/CESM/Library
export NETCDF_FORTRAN_LIBRARY=/home/jguo/CESM/Library/lib
export NETCDF_FORTRAN_INCLUDE_DIR=/home/jguo/CESM/Library/include
export LIB_NETCDF=/home/jguo/CESM/Library/lib
export LIB_NETCDF_C=/home/jguo/CESM/Library/lib
export LIB_NETCDF_FORTRAN=/home/jguo/CESM/Library/lib
export NetCDF_C_PATH=/home/jguo/CESM/Library/
export NetCDF_Fortran_PATH=/home/jguo/CESM/Library
export NetCDF=/home/jguo/CESM/Library
export PnetCDF=/home/jguo/CESM/Library
export PNETCDF_PATH=/home/jguo/CESM/Library
export PNetCDF_PATH=/home/jguo/CESM/Library
export NetCDF_Fortran_LIBRARY=/home/jguo/CESM/Library/lib
export NetCDF_Fortran_INCLUDE_DIR=/home/jguo/CESM/Library/include
export LDFLAGS="-L/home/jguo/CESM/Library/lib"
export LIBS="-lnetcdf -lhdf5_hl -lhdf5 -lz"

export LD_LIBRARY_PATH=/home/jguo/CESM/Library/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/x86_64-linux-gnu/
 
Top