Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

How to use mpirun to run clm In parallel?

jack

jack
Member
Hi,
I'm using a supercomputer to run clm5.0 and have set ./xmlchange NTASKS=64 and
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=16 (max per node)

in a slurm sbatch file.
However, I found the running speed of 16 ntasks (i.e., nodes=1) is the same or even faster than 64 ntasks (i.e., nodes=4),
the mpirun I'm using is mpich-3.3.1, it was loaded locally and wasn't moduled by the supercomputer moduleing system.
Does anybody know how to use more cores tu run clm5.0? Thanks
 

erik

Erik Kluzek
CSEG and Liaisons
Staff member
Hmmm. That is how you set the number of tasks in the system. It doesn't make sense to me that you would see lower number of tasks run faster at such a low number of processors. Because of communication costs we do expect that eventually, but I'd only expect that for many thousands of processors.

There always is variability in systems when you run though also. So you might do several simulations at the same number of tasks to make sure you have a representative test.

Since, this is a question about the overall system of running CESM (for the specific case of I compsets) I'm moving this to the general forum.
 

jack

jack
Member
thanks, erik. The maximum processors I can use is 128 (I paid for it), but it seems like only 16 porcessors plays a role. I suspect that the setting of my cesm (or mpirun) is wrong, do you have any suggestions?
Hmmm. That is how you set the number of tasks in the system. It doesn't make sense to me that you would see lower number of tasks run faster at such a low number of processors. Because of communication costs we do expect that eventually, but I'd only expect that for many thousands of processors.

There always is variability in systems when you run though also. So you might do several simulations at the same number of tasks to make sure you have a representative test.

Since, this is a question about the overall system of running CESM (for the specific case of I compsets) I'm moving this to the general forum.
 

jack

jack
Member
I have set
<MAX_TASKS_PER_NODE>64</MAX_TASKS_PER_NODE>
<MAX_MPITASKS_PER_NODE>64</MAX_MPITASKS_PER_NODE>
in config_machines.xml
and

<batch_system type="slurm" MACH="jack">
<batch_submit>sbatch</batch_submit>
<submit_args>
<arg flag="--time" name="$JOB_WALLCLOCK_TIME"/>
<arg flag="-p" name="$JOB_QUEUE"/>
<arg flag="--account" name="$PROJECT"/>
</submit_args>
</batch_system>
in config_batch.xml

The final sbatch file is as follows:
#!/bin/bash

#SBATCH --partition=hpib
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=16
#SBATCH --account=jack

module load scl/gcc8.3
module load mpich/3.4.1
echo myjob.sbatch start on $(date)
cd /home/jack/dat01/clm5.0/cime/scripts/mpi_test
./case.submit
echo myjob.sbatch end on $(date)
 
Top