Main menu

Navigation

Parallel Scalability of CESM

5 posts / 0 new
Last post
veera.atmsc@...
Parallel Scalability of CESM

Hello,

I am testing parallel scalability of CESM - CAM4 standalone model of cubed sphere grid for quarter degree resolution in the HPC installed in our institution.

422 nodes are available in total and the model is not scalable beyond 20 nodes.

Could you please help on troubleshooting process for this problem?

Thanks,

Veeramanikandan

 

jedwards

I can try - what version of the model do you have?  What hardware including node and network?  What model configuration?   

I expect scaling for the ne120 cubed sphere dycore to be close to linear out to 86400 mpi tasks.   

CESM Software Engineer

veera.atmsc@...

Hi Jedwards,

The hardware specification is,

  • Basic configuration: 
    GPU: 2x NVIDIA K40 (12GB, 2880 CUDA cores)
    Xeon Phi: 2x Intel Xeon Phi 7120P (16GB, 1.238 GHz, 61 cores)
    CPU: 2x E5-2680 v3 2.5GHz/12-Core 
    RAM: 62 GB 

  • 8 CPU, 8 GPU and 4 Xeon Phi nodes have 505 GB RAM each
  • Total number of compute nodes: 422 
    CPU nodes: 238
    GPU accelerated nodes: 161 
    Xeon Phi co-processor nodes: 23 

I am using CESM1.2.0

 

Thanks,

Veeramanikandan

jedwards

What model configuration and which of these nodes are you using.   The CPU nodes are appropriate for CESM, but there is no support for GPU nodes and CESM has not performed well on phi.  Which nodes did you use for the scaling study?   You also didn't say anything about network.

CESM Software Engineer

veera.atmsc@...

Model configuration:

$CAMCFG/configure -fc_type pgi -fc mpif90 -cc mpicc -dyn se -hgrid ne120np4 -spmd-ntasks120-phys cam4 -chem none -nosmp-test


CPU, mic and GPU nodes are all having the same architecture here and I am running on all these three nodes. I am not using any of the GPU or mic cards. Although, I tried running CESM only in CPU nodes and the scalability results are same.

 

Regarding system and network:

HP Proliant XL230a Gen9 and XL250a Gen9 based cluster (Intel Xeon E5-2680v3 @ 2.5 GHz dual twelve-core CPU and dual 2880-core NVIDIA Kepler K40 GPU nodes) w/Infiniband

 

Rmax = 524.40 TFlops

Rpeak = 861.74 TFlops

Log in or register to post comments

Who's new

  • kamal.tewari1@...
  • rchemke
  • abdulla.sakalli@...
  • mehmetugurgucel@...
  • borst