New Member
Dear All,I'm trying to run CESM2.2 on my own cluster without batch system. The case was set up and built without error. I use the I2000Clm50Vic compset with f19_g17 resolution for a test run. When I submit the case, I get the following errors:-------------------------------------------------------Primary job terminated normally, but 1 process returneda non-zero exit code.. Per user-direction, the job has been aborted.-------------------------------------------------------Fatal error in PMPI_Group_range_incl: Invalid argument, error stack:PMPI_Group_range_incl(195)........: MPI_Group_range_incl(group=0x88000000, n=1, ranges=0x7ffe318e4460, new_group=0x7ffe318e4004) failedMPIR_Group_check_valid_ranges(323): The 0th element of a range array ends at 63 but must be nonnegative and less than 1[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=201953292:system msg for write_line failure : Bad file descriptorFatal error in PMPI_Group_range_incl: Invalid argument, error stack:PMPI_Group_range_incl(195)........: MPI_Group_range_incl(group=0x88000000, n=1, ranges=0x7fff9764eb70, new_group=0x7fff9764e714) failedMPIR_Group_check_valid_ranges(323): The 0th element of a range array ends at 63 but must be nonnegative and less than 1[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=626700:system msg for write_line failure : Bad file descriptorFatal error in PMPI_Group_range_incl: Invalid argument, error stack:PMPI_Group_range_incl(195)........: MPI_Group_range_incl(group=0x88000000, n=1, ranges=0x7ffd6b803ac0, new_group=0x7ffd6b803664) failedMPIR_Group_check_valid_ranges(323): The 0th element of a range array ends at 63 but must be nonnegative and less than 1[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=201953292:system msg for write_line failure : Bad file descriptorFatal error in PMPI_Group_range_incl: Invalid argument, error stack:PMPI_Group_range_incl(195)........: MPI_Group_range_incl(group=0x88000000, n=1, ranges=0x7ffe83c4e920, new_group=0x7ffe83c4e4c4) failedMPIR_Group_check_valid_ranges(323): The 0th element of a range array ends at 63 but must be nonnegative and less than 1 [unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=1007259660................. Invalid PIO rearranger comm max pend req (comp2io), 0 Resetting PIO rearranger comm max pend req (comp2io) to 64 PIO rearranger options: comm type =p2p comm fcd =2denable max pend req (comp2io) = 0 enable_hs (comp2io) = T enable_isend (comp2io) = F max pend req (io2comp) = 64 enable_hs (io2comp) = F enable_isend (io2comp) = T(seq_comm_setcomm) init ID ( 1 GLOBAL ) pelist = 0 0 1 ( npes = 1) ( nthreads = 1)( suffix =) Invalid PIO rearranger comm max pend req (comp2io), 0 Resetting PIO rearranger comm max pend req (comp2io) to 64 PIO rearranger options: comm type =p2p comm fcd =2denable max pend req (comp2io) = 0 enable_hs (comp2io) = T enable_isend (comp2io) = F max pend req (io2comp) = 64 enable_hs (io2comp) = F enable_isend (io2comp) = T(seq_comm_setcomm) init ID ( 1 GLOBAL ) pelist = 0 0 1 ( npes = 1) ( nthreads = 1)( suffix =)Fatal error in PMPI_Group_range_incl: Invalid argument, error stack:PMPI_Group_range_incl(195)........: MPI_Group_range_incl(group=0x88000000, n=1, ranges=0x7ffddc3b5600, new_group=0x7ffddc3b51a4) failedMPIR_Group_check_valid_ranges(323): The 0th element of a range array ends at 63 but must be nonnegative and less than 1[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=470388748:system msg for write_line failure : Bad file descriptorFatal error in PMPI_Group_range_incl: Invalid argument, error stack:PMPI_Group_range_incl(195)........: MPI_Group_range_incl(group=0x88000000, n=1, ranges=0x7ffeddab1ac0, new_group=0x7ffeddab1664) failedMPIR_Group_check_valid_ranges(323): The 0th element of a range array ends at 63 but must be nonnegative and less than 1[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=470388748:system msg for write_line failure : Bad file descriptor--------------------------------------------------------------------------mpirun detected that one or more processes exited with non-zero status, thus causingthe job to be terminated. The first process to do so was: Process name: [[64377,1],0] Exit code: 12-------------------------------------------------------------------------- 42 total processes failed to start Unfortunately, I could not find the root of the problem. Any help will be really appreciated. Kind regards My xml codes for my machine are: Linux 64bit none LINUX gnu mpich /home/as2/CESM/projects/scratch /mnt/FNas/CESM/projects/cesm-inputdata /mnt/FNas/CESM/projects/cesm-inputdata/atm/datm7 /mnt/FNas/CESM/projects/scratch/archive/$CASE /mnt/FNas/CESM/projects/baselines $CIMEROOT/tools/cprnc/build/cprnc 8 none asakalli 32 32 /usr/bin/mpirun -np 64 --hostfile $ENV{HOME}/my_hosts_ip /home/as2/local/netcdf461 /home/as2/local/netcdf461 My xml codes for my compiler are: -std=gnu99 -fopenmp -g -Wall -Og -fbacktrace -ffpe-trap=invalid,zero,overflow -fcheck=bounds -O3 -DFORTRANUNDERSCORE -DNO_R16 -DCPRGNU FORTRAN -fdefault-real-8 -fconvert=big-endian -ffree-line-length-none -ffixed-line-length-none -fopenmp -g -Wall -Og -fbacktrace -ffpe-trap=zero,overflow -fcheck=bounds -O3 -ffixed-form -ffree-form FALSE /usr/bin/mpicc /usr/bin/mpicxx /usr/bin/mpif90 /usr/bin/gcc /usr/bin/g++ /usr/bin/gfortran TRUE -L/usr/lib -llapack -lblas -L/home/as2/local/netcdf461/lib/ -Wl,-Bsymbolic-functions -Wl,-z,relro -lnetcdf -lnetcdff The output from pelayout is:Comp NTASKS NTHRDS ROOTPECPL : 64/ 1; 0ATM : 64/ 1; 0LND : 64/ 1; 0ICE : 64/ 1; 0OCN : 64/ 1; 0ROF : 64/ 1; 0GLC : 64/ 1; 0WAV : 64/ 1; 0ESP : 1/ 1; 0 And the output from preview_run is:CASE INFO: nodes: 2 total tasks: 64 tasks per node: 32 thread count: 1 BATCH INFO: FOR JOB: case.run ENV: Setting Environment NETCDF_DIR=/home/as2/local/netcdf461 Setting Environment NETCDF_PATH=/home/as2/local/netcdf461 Setting Environment OMP_NUM_THREADS=1 SUBMIT CMD: None FOR JOB: case.st_archive ENV: Setting Environment NETCDF_DIR=/home/as2/local/netcdf461 Setting Environment NETCDF_PATH=/home/as2/local/netcdf461 Setting Environment OMP_NUM_THREADS=1 SUBMIT CMD: None MPIRUN: /usr/bin/mpirun -np 64 --hostfile /home/as2/my_hosts_ip /home/as2/CESM/projects/scratch/denemeI2000Clm50VicSecond/bld/cesm.exe >> cesm.log.$LID 2>&1