Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

error by running cesm2.2

Dear All,I'm trying to run CESM2.2 on my own cluster without batch system. The case was set up and built without error. I use the I2000Clm50Vic compset with f19_g17 resolution for a test run. When I submit the case, I get the following errors:-------------------------------------------------------Primary job  terminated normally, but 1 process returneda non-zero exit code.. Per user-direction, the job has been aborted.-------------------------------------------------------Fatal error in PMPI_Group_range_incl: Invalid argument, error stack:PMPI_Group_range_incl(195)........: MPI_Group_range_incl(group=0x88000000, n=1, ranges=0x7ffe318e4460, new_group=0x7ffe318e4004) failedMPIR_Group_check_valid_ranges(323): The 0th element of a range array ends at 63 but must be nonnegative and less than 1[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=201953292:system msg for write_line failure : Bad file descriptorFatal error in PMPI_Group_range_incl: Invalid argument, error stack:PMPI_Group_range_incl(195)........: MPI_Group_range_incl(group=0x88000000, n=1, ranges=0x7fff9764eb70, new_group=0x7fff9764e714) failedMPIR_Group_check_valid_ranges(323): The 0th element of a range array ends at 63 but must be nonnegative and less than 1[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=626700:system msg for write_line failure : Bad file descriptorFatal error in PMPI_Group_range_incl: Invalid argument, error stack:PMPI_Group_range_incl(195)........: MPI_Group_range_incl(group=0x88000000, n=1, ranges=0x7ffd6b803ac0, new_group=0x7ffd6b803664) failedMPIR_Group_check_valid_ranges(323): The 0th element of a range array ends at 63 but must be nonnegative and less than 1[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=201953292:system msg for write_line failure : Bad file descriptorFatal error in PMPI_Group_range_incl: Invalid argument, error stack:PMPI_Group_range_incl(195)........: MPI_Group_range_incl(group=0x88000000, n=1, ranges=0x7ffe83c4e920, new_group=0x7ffe83c4e4c4) failedMPIR_Group_check_valid_ranges(323): The 0th element of a range array ends at 63 but must be nonnegative and less than 1 [unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=1007259660................. Invalid PIO rearranger comm max pend req (comp2io),            0 Resetting PIO rearranger comm max pend req (comp2io) to           64 PIO rearranger options:   comm type     =p2p   comm fcd      =2denable   max pend req (comp2io)  =           0   enable_hs (comp2io)     = T   enable_isend (comp2io)  = F   max pend req (io2comp)  =          64   enable_hs (io2comp)    = F   enable_isend (io2comp)  = T(seq_comm_setcomm)  init ID (  1 GLOBAL          ) pelist   =     0     0     1 ( npes =     1) ( nthreads =  1)( suffix =) Invalid PIO rearranger comm max pend req (comp2io),            0 Resetting PIO rearranger comm max pend req (comp2io) to           64 PIO rearranger options:   comm type     =p2p   comm fcd      =2denable   max pend req (comp2io)  =           0   enable_hs (comp2io)     = T   enable_isend (comp2io)  = F   max pend req (io2comp)  =          64   enable_hs (io2comp)    = F   enable_isend (io2comp)  = T(seq_comm_setcomm)  init ID (  1 GLOBAL          ) pelist   =     0     0     1 ( npes =     1) ( nthreads =  1)( suffix =)Fatal error in PMPI_Group_range_incl: Invalid argument, error stack:PMPI_Group_range_incl(195)........: MPI_Group_range_incl(group=0x88000000, n=1, ranges=0x7ffddc3b5600, new_group=0x7ffddc3b51a4) failedMPIR_Group_check_valid_ranges(323): The 0th element of a range array ends at 63 but must be nonnegative and less than 1[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=470388748:system msg for write_line failure : Bad file descriptorFatal error in PMPI_Group_range_incl: Invalid argument, error stack:PMPI_Group_range_incl(195)........: MPI_Group_range_incl(group=0x88000000, n=1, ranges=0x7ffeddab1ac0, new_group=0x7ffeddab1664) failedMPIR_Group_check_valid_ranges(323): The 0th element of a range array ends at 63 but must be nonnegative and less than 1[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=470388748:system msg for write_line failure : Bad file descriptor--------------------------------------------------------------------------mpirun detected that one or more processes exited with non-zero status, thus causingthe job to be terminated. The first process to do so was:   Process name: [[64377,1],0]  Exit code:    12-------------------------------------------------------------------------- 42 total processes failed to start Unfortunately, I could not find the root of the problem. Any help will be really appreciated. Kind regards   My xml codes for my machine are:      Linux 64bit    none    LINUX    gnu    mpich    /home/as2/CESM/projects/scratch    /mnt/FNas/CESM/projects/cesm-inputdata    /mnt/FNas/CESM/projects/cesm-inputdata/atm/datm7    /mnt/FNas/CESM/projects/scratch/archive/$CASE    /mnt/FNas/CESM/projects/baselines    $CIMEROOT/tools/cprnc/build/cprnc    8    none    asakalli    32    32          /usr/bin/mpirun              -np 64        --hostfile $ENV{HOME}/my_hosts_ip                        /home/as2/local/netcdf461      /home/as2/local/netcdf461       My xml codes for my compiler are:      -std=gnu99     -fopenmp     -g -Wall -Og -fbacktrace -ffpe-trap=invalid,zero,overflow -fcheck=bounds     -O3             -DFORTRANUNDERSCORE -DNO_R16 -DCPRGNU    FORTRAN      -fdefault-real-8               -fconvert=big-endian -ffree-line-length-none -ffixed-line-length-none     -fopenmp         -g -Wall -Og -fbacktrace -ffpe-trap=zero,overflow -fcheck=bounds     -O3             -ffixed-form         -ffree-form     FALSE    /usr/bin/mpicc    /usr/bin/mpicxx   /usr/bin/mpif90   /usr/bin/gcc   /usr/bin/g++ /usr/bin/gfortran   TRUE      -L/usr/lib -llapack -lblas -L/home/as2/local/netcdf461/lib/ -Wl,-Bsymbolic-functions -Wl,-z,relro -lnetcdf -lnetcdff            The output from pelayout is:Comp  NTASKS  NTHRDS  ROOTPECPL :     64/     1;      0ATM :     64/     1;      0LND :     64/     1;      0ICE :     64/     1;      0OCN :     64/     1;      0ROF :     64/     1;      0GLC :     64/     1;      0WAV :     64/     1;      0ESP :      1/     1;      0 And the output from preview_run is:CASE INFO:  nodes: 2  total tasks: 64  tasks per node: 32  thread count: 1 BATCH INFO:  FOR JOB: case.run    ENV:      Setting Environment NETCDF_DIR=/home/as2/local/netcdf461      Setting Environment NETCDF_PATH=/home/as2/local/netcdf461      Setting Environment OMP_NUM_THREADS=1    SUBMIT CMD:      None   FOR JOB: case.st_archive    ENV:      Setting Environment NETCDF_DIR=/home/as2/local/netcdf461      Setting Environment NETCDF_PATH=/home/as2/local/netcdf461      Setting Environment OMP_NUM_THREADS=1    SUBMIT CMD:      None MPIRUN:  /usr/bin/mpirun -np 64 --hostfile /home/as2/my_hosts_ip /home/as2/CESM/projects/scratch/denemeI2000Clm50VicSecond/bld/cesm.exe  >> cesm.log.$LID 2>&1  
 

jedwards

CSEG and Liaisons
Staff member
This looks like an mpi configuration problem - have you tried running something like hello-world?  Often mpi will work on one node but fail when you try to use more than one so make sure and try your hello-world on 64 tasks. 
 
Top