gus@ldeo_columbia_edu
Member
Dear CCSM3 experts
I setup CCSM3 to run on bluefire, by modifying the original scripts from bluevista.
I need your help and advice on how to produce a load balanced and efficient setup,
to save GAUs on my runs.
I am mostly interested in T85_gx1v3 resolution.
The original (TASK,THREAD) pairs (i.e. MPI tasks and OpenMP threads)
on bluevista for T85_gx1v3 resolution are listed below.
They are available in the latest downloadable version of CCSM3 (3.0.1_beta14),
on file "scripts/ccsm_utils/Machines/env.ibm.bluevista".
They seem to be the same for other older NCAR IBM machines (bluesky, blackforest,
eagle, and even "generic_ibm"):
Component -- TASK -- THREAD
atm -------------- 16 -------- 4
lnd ---------------- 28 -------- 1
ocn --------------- 20 -------- 1
ice ----------------- 8 --------- 1
cpl ---------------- 8 ---------- 1
(Total processors = 128 = 16x4 + 28x1 + 20x1 + 8x1 + 8x1 )
This requires 128 processors.
I ran this setup on blufire using two nodes with SMT.
It took about 38 minutes of wall time to produce 5 days of simulation.
This is very slow, and most likely not load balanced.
With this setup, it would take about 46 hours to simulate one year!
The ocean seems not to have enough processors, whereas the atmosphere
and land have perhaps too many.
I experimented with several different "(TASK,THREAD)" sets,
using 1, 2, and 4 nodes, both with SMT and with processor binding.
In my experience, using one node or using four nodes make things worse,
and so does processor binding.
However, with two nodes and different "(TASK,THREAD)" pairs,
I could reduce the wall time for 5 simulation days to about 8.5 minutes.
Much better than beofre,
but this is still slow, and would take about 10 hours to simulate one year.
Nevertheless, there are so many possible combinations of "(TASK,THREAD)"
pairs, total number of processors, SMT, and processor binding,
that I cannot possibly try them all to find the sweet spot.
Hence my question:
Can anybody there tell me the optimal setup for CCSM3 T85_gx1v3 on bluefire?
Even better:
Can anybody post a load balanced env.ibm.bluefire file on this bulletin board?
Any suggestions, or previous experiences with load balancing T85_gx1v3 (or other resolutions),
are greatly appreciated also.
Thank you very much.
Gus Correa
I setup CCSM3 to run on bluefire, by modifying the original scripts from bluevista.
I need your help and advice on how to produce a load balanced and efficient setup,
to save GAUs on my runs.
I am mostly interested in T85_gx1v3 resolution.
The original (TASK,THREAD) pairs (i.e. MPI tasks and OpenMP threads)
on bluevista for T85_gx1v3 resolution are listed below.
They are available in the latest downloadable version of CCSM3 (3.0.1_beta14),
on file "scripts/ccsm_utils/Machines/env.ibm.bluevista".
They seem to be the same for other older NCAR IBM machines (bluesky, blackforest,
eagle, and even "generic_ibm"):
Component -- TASK -- THREAD
atm -------------- 16 -------- 4
lnd ---------------- 28 -------- 1
ocn --------------- 20 -------- 1
ice ----------------- 8 --------- 1
cpl ---------------- 8 ---------- 1
(Total processors = 128 = 16x4 + 28x1 + 20x1 + 8x1 + 8x1 )
This requires 128 processors.
I ran this setup on blufire using two nodes with SMT.
It took about 38 minutes of wall time to produce 5 days of simulation.
This is very slow, and most likely not load balanced.
With this setup, it would take about 46 hours to simulate one year!
The ocean seems not to have enough processors, whereas the atmosphere
and land have perhaps too many.
I experimented with several different "(TASK,THREAD)" sets,
using 1, 2, and 4 nodes, both with SMT and with processor binding.
In my experience, using one node or using four nodes make things worse,
and so does processor binding.
However, with two nodes and different "(TASK,THREAD)" pairs,
I could reduce the wall time for 5 simulation days to about 8.5 minutes.
Much better than beofre,
but this is still slow, and would take about 10 hours to simulate one year.
Nevertheless, there are so many possible combinations of "(TASK,THREAD)"
pairs, total number of processors, SMT, and processor binding,
that I cannot possibly try them all to find the sweet spot.
Hence my question:
Can anybody there tell me the optimal setup for CCSM3 T85_gx1v3 on bluefire?
Even better:
Can anybody post a load balanced env.ibm.bluefire file on this bulletin board?
Any suggestions, or previous experiences with load balancing T85_gx1v3 (or other resolutions),
are greatly appreciated also.
Thank you very much.
Gus Correa