Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

MPI run issues

We are trying to set up CCSM3 on a Beowulf cluster using the Portland compiler (PGI), and have successfully configured and built the model, but are running into problems running it because of MPI version issues. The first problem was that the latest version of mpirun does not include either the –pg or –p4pg options, so we had to set up a previous version (mpich-ethernet-pgi-1.2.7p1-1). When we try to run using mpirun we are still getting errors like:
----------------------------------------------------------------------------------
ssh: connect to host 1 port 22: Invalid argument
p0_7788: p4_error: Child process exited while making connection to remote process on 1: 0
----------------------------------------------------------------------------------
or
----------------------------------------------------------------------------------
rm_6235: p4_error: rm_start: net_conn_to_listener failed: 42573
p0_10456: p4_error: Child process exited while making connection to remote process on compute-0-1.local: 0
p0_10456: (116.613281) net_send: could not write to fd=4, errno = 32
----------------------------------------------------------------------------------

By searching on the web, we found that "Child process exited while..." has often occurred and some people suggested using "-static" to compile code, which is not support by current mpif90 compiler.
So, it would seem that we need a version of the mpif90 compiler that accepts both –p4pg and –static.
Can someone who has successfully run CCSM3 (on whatever system) please tell us:
1) Exact MPI version you used
2) Exact PGI Compiler version you used
3) Exact compiler FLAGS you used

Thanks.
Ben
 

jfarran@uci_edu

New Member
Hello.

We are in the same situation.

We have PGI compilers on a Linux Cluster using Rocks+ from clustercorp.com and our mpirun version does not accept the group file "-pg" nor the "-p4pg" flags to mpirun.

I compiled mpich-1.2.7 and that version accepts the "-p4pg" option, but when it runs, I get the same "net_send: could not write to fd=4, errno = 32" error listed above.

Has anyone been able to get CCSM3 to compile and run with mpich-1.2.7 and if so, can you please tell us what options you used to compile mpich and CCSM3?

Thank you,
Joseph
 
Top