Main menu

Navigation

MPI run issues

2 posts / 0 new
Last post
bsf208@...
MPI run issues

We are trying to set up CCSM3 on a Beowulf cluster using the Portland compiler (PGI), and have successfully configured and built the model, but are running into problems running it because of MPI version issues. The first problem was that the latest version of mpirun does not include either the –pg or –p4pg options, so we had to set up a previous version (mpich-ethernet-pgi-1.2.7p1-1). When we try to run using mpirun we are still getting errors like:
----------------------------------------------------------------------------------
ssh: connect to host 1 port 22: Invalid argument
p0_7788: p4_error: Child process exited while making connection to remote process on 1: 0
----------------------------------------------------------------------------------
or
----------------------------------------------------------------------------------
rm_6235: p4_error: rm_start: net_conn_to_listener failed: 42573
p0_10456: p4_error: Child process exited while making connection to remote process on compute-0-1.local: 0
p0_10456: (116.613281) net_send: could not write to fd=4, errno = 32
----------------------------------------------------------------------------------

By searching on the web, we found that "Child process exited while..." has often occurred and some people suggested using "-static" to compile code, which is not support by current mpif90 compiler.
So, it would seem that we need a version of the mpif90 compiler that accepts both –p4pg and –static.
Can someone who has successfully run CCSM3 (on whatever system) please tell us:
1) Exact MPI version you used
2) Exact PGI Compiler version you used
3) Exact compiler FLAGS you used

Thanks.
Ben

jfarran@...

Hello.

We are in the same situation.

We have PGI compilers on a Linux Cluster using Rocks+ from clustercorp.com and our mpirun version does not accept the group file "-pg" nor the "-p4pg" flags to mpirun.

I compiled mpich-1.2.7 and that version accepts the "-p4pg" option, but when it runs, I get the same "net_send: could not write to fd=4, errno = 32" error listed above.

Has anyone been able to get CCSM3 to compile and run with mpich-1.2.7 and if so, can you please tell us what options you used to compile mpich and CCSM3?

Thank you,
Joseph

Log in or register to post comments

Who's new

  • praveenmaniyatt@...
  • arjunbabun11@...
  • lama@...
  • sisi393@...
  • 1658093099@...