thompscs@purdue_edu
New Member
While attempting to run CESM on the Purdue cluster Steele using openmpi (1.5) as the MPI library and the Intel compiler (11.1.072), we encountered a strange situation in which adding more nodes to a job actually slowed the performance of the job dramatically.
Running a case with 8 nodes had it complete in 128 minutes. Running the same job with 16 nodes, the performance slowed dramatically to 230 minutes (and actually hit the 4 hour walltime limit on the job under two earlier invocations, causing it to get evicted from the cluster). This was repeatable. On our other cluster, Coates, this behavior was not experienced using the same compiler & libraries. On our other cluster we saw the job behavior fairly comparable with 8 nodes and speed up with 16 nodes, as expected.
What kind of bottleneck could be causing this, and is there a good way to debug a situation like this?
Running a case with 8 nodes had it complete in 128 minutes. Running the same job with 16 nodes, the performance slowed dramatically to 230 minutes (and actually hit the 4 hour walltime limit on the job under two earlier invocations, causing it to get evicted from the cluster). This was repeatable. On our other cluster, Coates, this behavior was not experienced using the same compiler & libraries. On our other cluster we saw the job behavior fairly comparable with 8 nodes and speed up with 16 nodes, as expected.
What kind of bottleneck could be causing this, and is there a good way to debug a situation like this?