Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

How many CPUs should I take in running T85 CAM3

Dear CAM users,
I am a novice user on running CAM3.
When I run it on IBM AIX I encounter some questions. In the CAM3 source code derived script run-ibm.csh, which lies in the directory ./models/atm/cam/bld, it said that 'The number of nodes should be a power of 2, up to max of 16 for T42'. My question is why should it be the max of 16 nodes for T42, and does it mean the total CPUs is restricted to 16 or to 16xtasks_per_node ? (Cause on IBM station that I use, the tasks_per_node is 8). But if so, why the time (17 minutes) used for running one month is longer by 32 CPUs (4 nodes with tasks_per_node=8) than by 16 CPUs (2 nodes with tasks_per_node=8)?
I want to know that how many CPUs (and how should they allot) should I take in running T42 CAM3 to get the most efficiency.
Furthermore, what about T85?



Any of your tips would be quite helpfull for me!
Thanks in advance!
Yours!
dfeijat@126.com

cams.cma
 

eaton

CSEG and Liaisons
The comments in the run-ibm.csh script are misleading. They are assuming a cluster that has 4 cpus/node. So the recommendation of a max of 16 nodes was intended to be a recommendation of a max of 64 cpus. At T42 that's one cpu per latitude. More cpus can be used, but the extras will be idle during the dynamics calculation. So while the overall throughput can be increased by using more than 64 cpus (because the physics calculations can make use of more than 64 cpus), the scaling will not be good. If you want to maximize efficiency rather than total throughput, then using fewer than 64 cpus will probably help. This depends on the network performance.

Similarly, for T85 use a maximum of 128 cpus.

Generally we find that using 4 or 8 threads per MPI task works well on the IBM SP clusters. And the total number of threads should equal the total number of cpus. So for T42 on 8-way nodes try using 8 nodes with 1 MPI task per node and 8 threads per task.
 
Dear eaton,

Thank you for your answer that would help me with my job greatly.
The problem confuses me a lot and consumes a great deal of my time to try to find the rule which would be best performance that you are so familiar with.
Thanks a lot.
I would try on it as soon as possible.

---------------------------------------------------------

Yours!

dfeijat@126.com

cams.cma
 
Top