Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

MPICH on linux cluster

hi,
Configured MPICH 1.2.7 with ch_p4, with-comm=shared on linux cluster, the MPI test program runs good on SMP. but it fails if it is tried on remote machine, machines.LINUX includes the remote machine name:
>
>****************************
>breeze:~/mpich-1.2.7/bin/mpirun -np 4 -v -machinefile machines.LINUX ~/mpich-1.2.7/examples/basic/cpi
>running /home/joy/mpich-1.2.7/examples/basic/cpi on 4 LINUX ch_p4
>processors
>Created /home/joy/mpich-1.2.7/examples/basic/PI20515
>rm_7455: p4_error: rm_start: net_conn_to_listener failed: 45830
>p0_20647: p4_error: Child process exited while making connection to
>remote process on haze: 0
>p0_20647: (14.683511) net_send: could not write to fd=7, errno = 32

is it necessary that common filesystem is needed on all machines in machinelist? does that mean to mount a shared filesystem on all machines?
will copy work?

Thanks,
Joy
 
Top