lillyw@gmail_com
New Member
hi,
Configured MPICH 1.2.7 with ch_p4, with-comm=shared on linux cluster, the MPI test program runs good on SMP. but it fails if it is tried on remote machine, machines.LINUX includes the remote machine name:
>
>****************************
>breeze:~/mpich-1.2.7/bin/mpirun -np 4 -v -machinefile machines.LINUX ~/mpich-1.2.7/examples/basic/cpi
>running /home/joy/mpich-1.2.7/examples/basic/cpi on 4 LINUX ch_p4
>processors
>Created /home/joy/mpich-1.2.7/examples/basic/PI20515
>rm_7455: p4_error: rm_start: net_conn_to_listener failed: 45830
>p0_20647: p4_error: Child process exited while making connection to
>remote process on haze: 0
>p0_20647: (14.683511) net_send: could not write to fd=7, errno = 32
is it necessary that common filesystem is needed on all machines in machinelist? does that mean to mount a shared filesystem on all machines?
will copy work?
Thanks,
Joy
Configured MPICH 1.2.7 with ch_p4, with-comm=shared on linux cluster, the MPI test program runs good on SMP. but it fails if it is tried on remote machine, machines.LINUX includes the remote machine name:
>
>****************************
>breeze:~/mpich-1.2.7/bin/mpirun -np 4 -v -machinefile machines.LINUX ~/mpich-1.2.7/examples/basic/cpi
>running /home/joy/mpich-1.2.7/examples/basic/cpi on 4 LINUX ch_p4
>processors
>Created /home/joy/mpich-1.2.7/examples/basic/PI20515
>rm_7455: p4_error: rm_start: net_conn_to_listener failed: 45830
>p0_20647: p4_error: Child process exited while making connection to
>remote process on haze: 0
>p0_20647: (14.683511) net_send: could not write to fd=7, errno = 32
is it necessary that common filesystem is needed on all machines in machinelist? does that mean to mount a shared filesystem on all machines?
will copy work?
Thanks,
Joy