Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

CAM hangs when run with more than one node

Hi all,

I'm trying to run CAM 3.1 on a Sun Opteron linux cluster, but am having problems using more than one node. I can successfully build and run CAM using multiple processors, but only if I stay within a single node. As soon as I attempt to run CAM on more than one node, CAM will hang. No output will occur, and the job will eventually time out. The last line in my output is something like this:

nlong( 64 )= 128 wnummax(64 ) = 42

The system I am using is running SuSE, PathScale 2.5 compiler, and Voltaire Infiniband. Also, when I run configure with the -test option, MPI (and everything else) seem to check out just fine.

Has anyone experienced something like this before? Any thoughts you may have would be greatly appreciated.

Thanks,

Matt Higgins
 
these questions might be of help!
-->configured with spmd?
-->how many procs are you trying to use and for what resolutions? in the sense for eul and sld in cam3.1 you cannot use running on procs more than nlat!?
 
Top