Parallel Scalability of CESM

I am testing parallel scalability of CESM - CAM4 standalone model of cubed sphere grid for quarter degree resolution in the HPC installed in our institution.

422 nodes are available in total and the model is not scalable beyond 20 nodes.

Could you please help on troubleshooting process for this problem?





I can try - what version of the model do you have?  What hardware including node and network?  What model configuration?   

I expect scaling for the ne120 cubed sphere dycore to be close to linear out to 86400 mpi tasks.   

CESM Software Engineer


Hi Jedwards,

The hardware specification is,

  • Basic configuration: 
    GPU: 2x NVIDIA K40 (12GB, 2880 CUDA cores)
    Xeon Phi: 2x Intel Xeon Phi 7120P (16GB, 1.238 GHz, 61 cores)
    CPU: 2x E5-2680 v3 2.5GHz/12-Core 
    RAM: 62 GB 

  • 8 CPU, 8 GPU and 4 Xeon Phi nodes have 505 GB RAM each
  • Total number of compute nodes: 422 
    CPU nodes: 238
    GPU accelerated nodes: 161 
    Xeon Phi co-processor nodes: 23 

I am using CESM1.2.0





What model configuration and which of these nodes are you using.   The CPU nodes are appropriate for CESM, but there is no support for GPU nodes and CESM has not performed well on phi.  Which nodes did you use for the scaling study?   You also didn't say anything about network.

CESM Software Engineer


Model configuration:

$CAMCFG/configure -fc_type pgi -fc mpif90 -cc mpicc -dyn se -hgrid ne120np4 -spmd-ntasks120-phys cam4 -chem none -nosmp-test

CPU, mic and GPU nodes are all having the same architecture here and I am running on all these three nodes. I am not using any of the GPU or mic cards. Although, I tried running CESM only in CPU nodes and the scalability results are same.


Regarding system and network:

HP Proliant XL230a Gen9 and XL250a Gen9 based cluster (Intel Xeon E5-2680v3 @ 2.5 GHz dual twelve-core CPU and dual 2880-core NVIDIA Kepler K40 GPU nodes) w/Infiniband


Rmax = 524.40 TFlops

Rpeak = 861.74 TFlops

