cathryn_meyer@yale_edu
Member
I am running CAM on a linux cluster using mpich version 1.2.6. The model configures and compiles fine, however when I try to run it it crashes. The end of the output file looks like this:
-----------------------------------------
Number of lats passed north & south = 3
Node Partition Extended Partition
-----------------------------------------
0 1- 32 -2- 35
1 33- 64 30- 67
procid 0 assigned 473 spectral coefficients and
21 m values: 1 5 9 13
17 21 25 29 33 37
41 4 8 12 16 20
24 28 32 36 40
procid 1 assigned 473 spectral coefficients and
22 m values: 2 6 10 14
18 22 26 30 34 38
42 3 7 11 15 19
23 27 31 35 39 43
SPMDBUF: Allocating SPMD buffers of size 2387984
**** Summary of Logical Unit assignments ****
Restart pointer unit (nsds) = 1
Master restart unit (nrg) = 2
Abs/ems unit for restart (nrg2) = 3
History restart unit (luhrest) = 4
p0_24535: (0.189976) net_send: could not write to fd=4, errno = 32
p4_error: latest msg from perror: Broken pipe
p0_24535: p4_error: net_send write: -1
p0_24535: (2.195281) net_send: could not write to fd=4, errno = 32
-------------------------------------------------------------------------------------
Has anybody gotten these errors before? If so, what do these errors mean and are they fixable?
Thanks,
Cathy
-----------------------------------------
Number of lats passed north & south = 3
Node Partition Extended Partition
-----------------------------------------
0 1- 32 -2- 35
1 33- 64 30- 67
procid 0 assigned 473 spectral coefficients and
21 m values: 1 5 9 13
17 21 25 29 33 37
41 4 8 12 16 20
24 28 32 36 40
procid 1 assigned 473 spectral coefficients and
22 m values: 2 6 10 14
18 22 26 30 34 38
42 3 7 11 15 19
23 27 31 35 39 43
SPMDBUF: Allocating SPMD buffers of size 2387984
**** Summary of Logical Unit assignments ****
Restart pointer unit (nsds) = 1
Master restart unit (nrg) = 2
Abs/ems unit for restart (nrg2) = 3
History restart unit (luhrest) = 4
p0_24535: (0.189976) net_send: could not write to fd=4, errno = 32
p4_error: latest msg from perror: Broken pipe
p0_24535: p4_error: net_send write: -1
p0_24535: (2.195281) net_send: could not write to fd=4, errno = 32
-------------------------------------------------------------------------------------
Has anybody gotten these errors before? If so, what do these errors mean and are they fixable?
Thanks,
Cathy