This site is migrating to a new forum software on Tuesday, September 24th 2019, you may experience a short downtime during this transition

Main menu

Navigation

errors when running CCSM on linux cluster

2 posts / 0 new
Last post
cathryn.meyer@...
errors when running CCSM on linux cluster

Hello,

I recently installed CCSM on a GB Ethernet linux cluster using mpich-1.2.5.2 and pgi 6.0-5. The model builds fine, but when I go to run it, I get the following output (each error is repeated many times, I've only pasted a few here):

-----------------------------------------------------------------
Terminated
recv-atm
p1_23528: p4_error: net_recv read: probable EOF on socket: 1
(cpl_contract_init) ice-send-cpl
p3_23574: p4_error: net_recv read: probable EOF on socket: 1
(cpl_contract_init) ice-send-cpl
p4_23597: p4_error: net_recv read: probable EOF on socket: 1
(cpl_contract_init) ice-send-cpl
p5_23620: p4_error: net_recv read: probable EOF on socket: 1
(cpl_contract_init) ice-send-cpl
p8_23689: p4_error: net_recv read: probable EOF on socket: 1
(cpl_comm_init) setting up communicators, name = ocn
===================================
mph attempted to call MPI_INIT
(cpl_comm_init) cpl_comm_comp, size: 137 24
(cpl_comm_init) comm world : comm,npe,pid 133 56 37
(cpl_comm_init) comm component: comm,npe,pid 137 24 21
(cpl_comm_init) comm world pe0: atm,ice,lnd,ocn,cpl,me 40 2 10 16 0 16
(cpl_comm_init) mph cid : atm,ice,lnd,ocn,cpl,me 1 2 3 4 5 4
(cpl_contract_init) ocn-send-cpl
p37_17996: p4_error: net_recv read: probable EOF on socket: 1
(cpl_comm_init) setting up communicators, name = ocn
===================================
...
p19_17600: p4_error: net_recv read: probable EOF on socket: 1
p41_18123: p4_error: net_recv read: probable EOF on socket: 1
p16_17534: p4_error: net_recv read: probable EOF on socket: 1
p45_18211: p4_error: net_recv read: probable EOF on socket: 1
p10_17402: p4_error: net_recv read: probable EOF on socket: 1
p12_17446: p4_error: net_recv read: probable EOF on socket: 1
p11_17424: p4_error: net_recv read: probable EOF on socket: 1

bm_list_23506: (97.578573) wakeup_slave: unable to interrupt slave 0 pid 23505
bm_list_23506: (97.579286) wakeup_slave: unable to interrupt slave 0 pid 23505
p2_23551: p4_error: net_recv read: probable EOF on socket: 1
Broken pipe
Broken pipe
Killing MPICH slave process, PID 23505
Killing MPICH slave process, PID 23506
--------------------------------------------------------------------------

It seems like the different model components are unable to communicate with the coupler.

Does anybody know what any of these errors mean? Do they indicate a problem in the model source code? An MPICH problem? Something else?

Thanks,
Cathy

cathryn.meyer@...

I have looked in the cpl.log file, and found that the actual error that CCSM is getting is:

---------------------------------------------------------------------------
MCT::m_AttrVect::indexRA_:: ERROR--attribute not found: "afrac" Traceback:

(frac_set) ->MCT::m_AttrVect::indexRA_
MCT(MPEU)::m_List::clean_: deallocate(aList%...) error, stat =1
MCT(MPEU)::m_List::clean_: deallocate(aList%...) error, stat =1
MCT::m_AttrVect::indexRA_:: ERROR--attribute not found: "afrac" Traceback:

(cpl_bundle_mult) ->MCT::m_AttrVect::indexRA_
MCT(MPEU)::m_List::clean_: deallocate(aList%...) error, stat =1
MCT::m_AttrVect::indexRA_:: ERROR--attribute not found: "afrac" Traceback:

(cpl_bundle_mult) ->MCT::m_AttrVect::indexRA_
MCT(MPEU)::m_List::clean_: deallocate(aList%...) error, stat =1
MCT::m_AttrVect::indexRA_:: ERROR--attribute not found: "afrac" Traceback:

(cpl_bundle_mult) ->MCT::m_AttrVect::indexRA_
MCT(MPEU)::m_List::clean_: deallocate(aList%...) error, stat =1
MCT::m_AttrVect::indexRA_:: ERROR--attribute not found: "afrac" Traceback:

(cpl_bundle_mult) ->MCT::m_AttrVect::indexRA_
MCT(MPEU)::m_List::clean_: deallocate(aList%...) error, stat =1
(cpl_map_bun) WARNING: bundle aoflux_o has accum count = 0
MCT::m_AttrVect::indexRA_:: ERROR--attribute not found: "afrac" Traceback:

(cpl_bundle_mult) ->MCT::m_AttrVect::indexRA_
MCT(MPEU)::m_List::clean_: deallocate(aList%...) error, stat =1
--------------------------------------------------------------------

I'm sure nobody has seen errors like these before, but does anybody know what "afrac" is? Apparently CCSM cannot locate this attribute, but I can't figure out where it's coming from.

Thanks,
Cathy

Log in or register to post comments

Who's new

  • jwolff
  • tinna.gunnarsdo...
  • sarthak2235@...
  • eolivares@...
  • shubham.gandhi@...