How to use mpirun to run clm In parallel?

jack · Dec 8, 2021

Hi,
I'm using a supercomputer to run clm5.0 and have set ./xmlchange NTASKS=64 and
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=16 (max per node)
in a slurm sbatch file.
However, I found the running speed of 16 ntasks (i.e., nodes=1) is the same or even faster than 64 ntasks (i.e., nodes=4),
the mpirun I'm using is mpich-3.3.1, it was loaded locally and wasn't moduled by the supercomputer moduleing system.
Does anybody know how to use more cores tu run clm5.0? Thanks

erik · Dec 8, 2021

Hmmm. That is how you set the number of tasks in the system. It doesn't make sense to me that you would see lower number of tasks run faster at such a low number of processors. Because of communication costs we do expect that eventually, but I'd only expect that for many thousands of processors.

There always is variability in systems when you run though also. So you might do several simulations at the same number of tasks to make sure you have a representative test.

Since, this is a question about the overall system of running CESM (for the specific case of I compsets) I'm moving this to the general forum.

jack · Dec 8, 2021

thanks, erik. The maximum processors I can use is 128 (I paid for it), but it seems like only 16 porcessors plays a role. I suspect that the setting of my cesm (or mpirun) is wrong, do you have any suggestions?

erik said:
Hmmm. That is how you set the number of tasks in the system. It doesn't make sense to me that you would see lower number of tasks run faster at such a low number of processors. Because of communication costs we do expect that eventually, but I'd only expect that for many thousands of processors.

There always is variability in systems when you run though also. So you might do several simulations at the same number of tasks to make sure you have a representative test.

Since, this is a question about the overall system of running CESM (for the specific case of I compsets) I'm moving this to the general forum.

jack · Dec 8, 2021

I have set
<MAX_TASKS_PER_NODE>64</MAX_TASKS_PER_NODE>
<MAX_MPITASKS_PER_NODE>64</MAX_MPITASKS_PER_NODE>
in config_machines.xml
and
<batch_system type="slurm" MACH="jack">
<batch_submit>sbatch</batch_submit>
<submit_args>
<arg flag="--time" name="$JOB_WALLCLOCK_TIME"/>
<arg flag="-p" name="$JOB_QUEUE"/>
<arg flag="--account" name="$PROJECT"/>
</submit_args>
</batch_system>
in config_batch.xml

The final sbatch file is as follows:
#!/bin/bash

#SBATCH --partition=hpib
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=16
#SBATCH --account=jack

module load scl/gcc8.3
module load mpich/3.4.1
echo myjob.sbatch start on $(date)
cd /home/jack/dat01/clm5.0/cime/scripts/mpi_test
./case.submit
echo myjob.sbatch end on $(date)

How to use mpirun to run clm In parallel?

jack

jack

Member

erik

Erik Kluzek

CSEG and Liaisons

jack

jack

Member

jack

jack

Member