This site is migrating to a new forum software on Tuesday, September 24th 2019, you may experience a short downtime during this transition

Main menu

Navigation

CCSM3 on bluefire: Help with load balance

3 posts / 0 new
Last post
gus@...
CCSM3 on bluefire: Help with load balance

Dear CCSM3 experts

I setup CCSM3 to run on bluefire, by modifying the original scripts from bluevista.
I need your help and advice on how to produce a load balanced and efficient setup,
to save GAUs on my runs.

I am mostly interested in T85_gx1v3 resolution.
The original (TASK,THREAD) pairs (i.e. MPI tasks and OpenMP threads)
on bluevista for T85_gx1v3 resolution are listed below.
They are available in the latest downloadable version of CCSM3 (3.0.1_beta14),
on file "scripts/ccsm_utils/Machines/env.ibm.bluevista".
They seem to be the same for other older NCAR IBM machines (bluesky, blackforest,
eagle, and even "generic_ibm"):

Component -- TASK -- THREAD
atm -------------- 16 -------- 4
lnd ---------------- 28 -------- 1
ocn --------------- 20 -------- 1
ice ----------------- 8 --------- 1
cpl ---------------- 8 ---------- 1

(Total processors = 128 = 16x4 + 28x1 + 20x1 + 8x1 + 8x1 )

This requires 128 processors.
I ran this setup on blufire using two nodes with SMT.
It took about 38 minutes of wall time to produce 5 days of simulation.
This is very slow, and most likely not load balanced.
With this setup, it would take about 46 hours to simulate one year!
The ocean seems not to have enough processors, whereas the atmosphere
and land have perhaps too many.

I experimented with several different "(TASK,THREAD)" sets,
using 1, 2, and 4 nodes, both with SMT and with processor binding.
In my experience, using one node or using four nodes make things worse,
and so does processor binding.
However, with two nodes and different "(TASK,THREAD)" pairs,
I could reduce the wall time for 5 simulation days to about 8.5 minutes.
Much better than beofre,
but this is still slow, and would take about 10 hours to simulate one year.

Nevertheless, there are so many possible combinations of "(TASK,THREAD)"
pairs, total number of processors, SMT, and processor binding,
that I cannot possibly try them all to find the sweet spot.

Hence my question:

Can anybody there tell me the optimal setup for CCSM3 T85_gx1v3 on bluefire?

Even better:

Can anybody post a load balanced env.ibm.bluefire file on this bulletin board?

Any suggestions, or previous experiences with load balancing T85_gx1v3 (or other resolutions),
are greatly appreciated also.

Thank you very much.
Gus Correa

Gus Correa Lamont-Doherty Earth Observatory of Columbia University

lmurphy@...

Did you figure this out? I just did some benchmarking for a T85_gx1v3 run and got around 10 simulated years per wall clock day. I can send you my configuration if you still need it.

tjive@...

I would love to know what people found was a good combo of nodes and tasks per nodes!

-Tianna

Log in or register to post comments

Who's new

  • jwolff
  • tinna.gunnarsdo...
  • sarthak2235@...
  • eolivares@...
  • shubham.gandhi@...