cathryn_meyer@yale_edu
Member
Hello,
I recently installed CCSM on a GB Ethernet linux cluster using mpich-1.2.5.2 and pgi 6.0-5. The model builds fine, but when I go to run it, I get the following output (each error is repeated many times, I've only pasted a few here):
-----------------------------------------------------------------
Terminated
recv-atm
p1_23528: p4_error: net_recv read: probable EOF on socket: 1
(cpl_contract_init) ice-send-cpl
p3_23574: p4_error: net_recv read: probable EOF on socket: 1
(cpl_contract_init) ice-send-cpl
p4_23597: p4_error: net_recv read: probable EOF on socket: 1
(cpl_contract_init) ice-send-cpl
p5_23620: p4_error: net_recv read: probable EOF on socket: 1
(cpl_contract_init) ice-send-cpl
p8_23689: p4_error: net_recv read: probable EOF on socket: 1
(cpl_comm_init) setting up communicators, name = ocn
===================================
mph attempted to call MPI_INIT
(cpl_comm_init) cpl_comm_comp, size: 137 24
(cpl_comm_init) comm world : comm,npe,pid 133 56 37
(cpl_comm_init) comm component: comm,npe,pid 137 24 21
(cpl_comm_init) comm world pe0: atm,ice,lnd,ocn,cpl,me 40 2 10 16 0 16
(cpl_comm_init) mph cid : atm,ice,lnd,ocn,cpl,me 1 2 3 4 5 4
(cpl_contract_init) ocn-send-cpl
p37_17996: p4_error: net_recv read: probable EOF on socket: 1
(cpl_comm_init) setting up communicators, name = ocn
===================================
...
p19_17600: p4_error: net_recv read: probable EOF on socket: 1
p41_18123: p4_error: net_recv read: probable EOF on socket: 1
p16_17534: p4_error: net_recv read: probable EOF on socket: 1
p45_18211: p4_error: net_recv read: probable EOF on socket: 1
p10_17402: p4_error: net_recv read: probable EOF on socket: 1
p12_17446: p4_error: net_recv read: probable EOF on socket: 1
p11_17424: p4_error: net_recv read: probable EOF on socket: 1
bm_list_23506: (97.578573) wakeup_slave: unable to interrupt slave 0 pid 23505
bm_list_23506: (97.579286) wakeup_slave: unable to interrupt slave 0 pid 23505
p2_23551: p4_error: net_recv read: probable EOF on socket: 1
Broken pipe
Broken pipe
Killing MPICH slave process, PID 23505
Killing MPICH slave process, PID 23506
--------------------------------------------------------------------------
It seems like the different model components are unable to communicate with the coupler.
Does anybody know what any of these errors mean? Do they indicate a problem in the model source code? An MPICH problem? Something else?
Thanks,
Cathy
I recently installed CCSM on a GB Ethernet linux cluster using mpich-1.2.5.2 and pgi 6.0-5. The model builds fine, but when I go to run it, I get the following output (each error is repeated many times, I've only pasted a few here):
-----------------------------------------------------------------
Terminated
recv-atm
p1_23528: p4_error: net_recv read: probable EOF on socket: 1
(cpl_contract_init) ice-send-cpl
p3_23574: p4_error: net_recv read: probable EOF on socket: 1
(cpl_contract_init) ice-send-cpl
p4_23597: p4_error: net_recv read: probable EOF on socket: 1
(cpl_contract_init) ice-send-cpl
p5_23620: p4_error: net_recv read: probable EOF on socket: 1
(cpl_contract_init) ice-send-cpl
p8_23689: p4_error: net_recv read: probable EOF on socket: 1
(cpl_comm_init) setting up communicators, name = ocn
===================================
mph attempted to call MPI_INIT
(cpl_comm_init) cpl_comm_comp, size: 137 24
(cpl_comm_init) comm world : comm,npe,pid 133 56 37
(cpl_comm_init) comm component: comm,npe,pid 137 24 21
(cpl_comm_init) comm world pe0: atm,ice,lnd,ocn,cpl,me 40 2 10 16 0 16
(cpl_comm_init) mph cid : atm,ice,lnd,ocn,cpl,me 1 2 3 4 5 4
(cpl_contract_init) ocn-send-cpl
p37_17996: p4_error: net_recv read: probable EOF on socket: 1
(cpl_comm_init) setting up communicators, name = ocn
===================================
...
p19_17600: p4_error: net_recv read: probable EOF on socket: 1
p41_18123: p4_error: net_recv read: probable EOF on socket: 1
p16_17534: p4_error: net_recv read: probable EOF on socket: 1
p45_18211: p4_error: net_recv read: probable EOF on socket: 1
p10_17402: p4_error: net_recv read: probable EOF on socket: 1
p12_17446: p4_error: net_recv read: probable EOF on socket: 1
p11_17424: p4_error: net_recv read: probable EOF on socket: 1
bm_list_23506: (97.578573) wakeup_slave: unable to interrupt slave 0 pid 23505
bm_list_23506: (97.579286) wakeup_slave: unable to interrupt slave 0 pid 23505
p2_23551: p4_error: net_recv read: probable EOF on socket: 1
Broken pipe
Broken pipe
Killing MPICH slave process, PID 23505
Killing MPICH slave process, PID 23506
--------------------------------------------------------------------------
It seems like the different model components are unable to communicate with the coupler.
Does anybody know what any of these errors mean? Do they indicate a problem in the model source code? An MPICH problem? Something else?
Thanks,
Cathy