We have successfully built CESM1.2.2 on a local machine using a test case but running this case is a problem.
This test case has the following configure:
./create_newcase -case test1_elvar -res f45_g37 -compset X -mach elja
In the end of the .out file (emerging after submitting the run file via sbatch) it is clear that:
Model did not complete - see /users/home/hera/cesm/test1_elvar/run/cesm.log.220711-172407
And the content of the log file is:
[hera@elja-irhpc run]$ cat cesm.log.220711-172407
[compute-1:2804911] *** An error occurred in MPI_Group_range_incl
[compute-1:2804911] *** reported by process [299565057,0]
[compute-1:2804911] *** on communicator MPI_COMM_WORLD
[compute-1:2804911] *** MPI_ERR_RANK: invalid rank
[compute-1:2804911] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[compute-1:2804911] *** and potentially your MPI job)
This appears to be a node problem or at least something that is not specified correctly with respect to the local machine in the .run file.
The first lines in the .run script are of the following:
#! /bin/csh -f
#SBATCH --nodes=1 # one node
#SBATCH --job-name=CESM
#SBATCH --partition=24cpu_192mem # Each node has 2x24 processors and 192 GB RAM
#SBATCH --ntasks-per-node=48 # Use 48 processors
#SBATCH --time=1-00:00:00
#SBATCH --get-user-env
#SBATCH --mail-type=ALL
#SBATCH --mail-user=hera@hi.is
#SBATCH -e test1_elvar.err ##File to use for standard error
#SBATCH -o test1_elvar.out ##File to use for standard out.
Your assistance is greatly appreciated.
Hera
This test case has the following configure:
./create_newcase -case test1_elvar -res f45_g37 -compset X -mach elja
In the end of the .out file (emerging after submitting the run file via sbatch) it is clear that:
Model did not complete - see /users/home/hera/cesm/test1_elvar/run/cesm.log.220711-172407
And the content of the log file is:
[hera@elja-irhpc run]$ cat cesm.log.220711-172407
[compute-1:2804911] *** An error occurred in MPI_Group_range_incl
[compute-1:2804911] *** reported by process [299565057,0]
[compute-1:2804911] *** on communicator MPI_COMM_WORLD
[compute-1:2804911] *** MPI_ERR_RANK: invalid rank
[compute-1:2804911] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[compute-1:2804911] *** and potentially your MPI job)
This appears to be a node problem or at least something that is not specified correctly with respect to the local machine in the .run file.
The first lines in the .run script are of the following:
#! /bin/csh -f
#SBATCH --nodes=1 # one node
#SBATCH --job-name=CESM
#SBATCH --partition=24cpu_192mem # Each node has 2x24 processors and 192 GB RAM
#SBATCH --ntasks-per-node=48 # Use 48 processors
#SBATCH --time=1-00:00:00
#SBATCH --get-user-env
#SBATCH --mail-type=ALL
#SBATCH --mail-user=hera@hi.is
#SBATCH -e test1_elvar.err ##File to use for standard error
#SBATCH -o test1_elvar.out ##File to use for standard out.
Your assistance is greatly appreciated.
Hera