xlong@ncsu_edu
Member
I use pgi10.5 and mpich2
netcdf 3.6.3
The runtime error occurs at Initialize lnd component.
The cpl log file stop at :(seq_mct_drv) : Initialize lnd component
The lnd log file stop at :
Attempting to read surface boundary data .....
(GETFIL): attempting to find local file surfdata_4x5_simyr2000_c090928.nc
(GETFIL): using
/stormtrack_data/xlong/cesmdata/input/lnd/clm2/surfdata/surfdata_4x5_simyr2000_c090928.nc
The ccsm log file gives some error information:
==========================================
16 pes participating in computation for CLM
-----------------------------------
NODE# NAME
( 0) n3m5-9
( 1) n3m5-9
( 2) n3m5-9
( 3) n3m5-9
( 4) n3m5-9
( 5) n3m5-9
( 6) n3m5-9
( 7) n3m5-9
( 8) n3m5-8
( 9) n3m5-8
( 10) n3m5-8
( 11) n3m5-8
( 12) n3m5-8
( 13) n3m5-8
( 14) n3m5-8
( 15) n3m5-8
Opened existing file
/stormtrack_data/xlong/cesmdata/input/lnd/clm2/griddata/griddata_4x5_060404.nc
31
Opened existing file
/stormtrack_data/xlong/cesmdata/input/lnd/clm2/griddata/fracdata_4x5_gx3v7_c091231.nc
32
Opened existing file
/stormtrack_data/xlong/cesmdata/input/lnd/clm2/surfdata/surfdata_4x5_simyr2000_c090928.nc
31
proc= 1 beg atmcell = 93 end atmcell = 184
total atmcells per proc = 92
proc= 1 atm ngseg = 417 atm nlseg = 26
proc= 1 nclumps = 1
proc= 7 beg atmcell = 643 end atmcell = 734
total atmcells per proc = 92
proc= 7 atm ngseg = 417 atm nlseg = 26
proc= 7 nclumps = 1
proc= 14 beg atmcell = 1285 end atmcell = 1376
total atmcells per proc = 92
proc= 14 atm ngseg = 417 atm nlseg = 24
proc= 14 nclumps = 1
proc= 15 beg atmcell = 1377 end atmcell = 1466
total atmcells per proc = 90
proc= 15 atm ngseg = 417 atm nlseg = 26
proc= 15 nclumps = 1
Opened existing file
/stormtrack_data/xlong/cesmdata/input/lnd/clm2/griddata/griddata_4x5_060404.nc
31
Opened existing file
/stormtrack_data/xlong/cesmdata/input/lnd/clm2/griddata/fracdata_4x5_gx3v7_c091231.nc
31
Opened existing file
/stormtrack_data/xlong/cesmdata/input/lnd/clm2/surfdata/surfdata_4x5_simyr2000_c090928.nc
31
Opened existing file
/stormtrack_data/xlong/cesmdata/input/lnd/clm2/surfdata/surfdata_4x5_simyr2000_c090928.nc
31
Opened existing file
/stormtrack_data/xlong/cesmdata/input/lnd/clm2/pftdata/pft-physiology.c110425.nc
31
Opened existing file
/stormtrack_data/xlong/cesmdata/input/lnd/clm2/surfdata/surfdata_4x5_simyr2000_c090928.nc
31
Fatal error in MPI_Allreduce: Other MPI error, error stack:
MPI_Allreduce(773)................: MPI_Allreduce(sbuf=0x143cae0, rbuf=0x143b420, count=1, MPI_LOGICAL, MPI_LOR, comm=0x84000002) failed
MPIR_Allreduce(289)...............:
MPIC_Sendrecv(164)................:
MPIC_Wait(519)....................:
MPIDI_CH3I_Progress(165)..........:
MPID_nem_mpich2_blocking_recv(895):
MPID_nem_tcp_connpoll(1725).......:
state_commrdy_handler(1555).......:
MPID_nem_tcp_recv_handler(1445)...: socket closed
Fatal error in MPI_Allreduce: Other MPI error, error stack:
MPI_Allreduce(773)................: MPI_Allreduce(sbuf=0x143cae0, rbuf=0x143b420, count=1, MPI_LOGICAL, MPI_LOR, comm=0x84000002) failed
MPIR_Allreduce(289)...............:
MPIC_Sendrecv(164)................:
MPIC_Wait(519)....................:
MPIDI_CH3I_Progress(165)..........:
MPID_nem_mpich2_blocking_recv(895):
MPID_nem_tcp_connpoll(1725).......:
state_commrdy_handler(1555).......:
MPID_nem_tcp_recv_handler(1445)...: socket closed
Fatal error in MPI_Allreduce: Other MPI error, error stack:
MPI_Allreduce(773)................: MPI_Allreduce(sbuf=0x143cae0, rbuf=0x143b420, count=1, MPI_LOGICAL, MPI_LOR, comm=0x84000002) failed
MPIR_Allreduce(289)...............:
MPIC_Sendrecv(164)................:
MPIC_Wait(519)....................:
MPIDI_CH3I_Progress(165)..........:
MPID_nem_mpich2_blocking_recv(895):
MPID_nem_tcp_connpoll(1725).......:
state_commrdy_handler(1555).......:
MPID_nem_tcp_recv_handler(1445)...: socket closed
Fatal error in MPI_Allreduce: Other MPI error, error stack:
MPI_Allreduce(773)................: MPI_Allreduce(sbuf=0x143cae0, rbuf=0x143b420, count=1, MPI_LOGICAL, MPI_LOR, comm=0x84000002) failed
MPIR_Allreduce(289)...............:
MPIC_Sendrecv(164)................:
MPIC_Wait(519)....................:
MPIDI_CH3I_Progress(165)..........:
MPID_nem_mpich2_blocking_recv(895):
MPID_nem_tcp_connpoll(1725).......:
state_commrdy_handler(1555).......:
MPID_nem_tcp_recv_handler(1445)...: socket closed
APPLICATION TERMINATED WITH THE EXIT STRING: Killed (signal 9)
Job mpichhydra_wrapper /usr/local/lsf/7.0/linux2.6-glibc2.3-x86_64/bin/TaskStarter -p n3m5-9:41620 -c /usr/local/lsf/conf -s /usr/local/lsf/7.0/linux2.6-glibc2.3-x86_64/etc -a LINUX86 /stormtrack_data/xlong/b0206.5/run/ccsm.exe
TID HOST_NAME COMMAND_LINE STATUS TERMINATION_TIME
===== ========== ================ ======================= ===================
00000 n3m5-8 /stormtrack_data Signaled (SIGSEGV) 02/06/2013 16:02:02
00001 n3m5-8 /stormtrack_data Exit (1) 02/06/2013 16:02:02
00002 n3m5-8 /stormtrack_data Signaled (SIGSEGV) 02/06/2013 16:02:02
00003 n3m5-8 /stormtrack_data Signaled (SIGSEGV) 02/06/2013 16:02:02
00004 n3m5-8 /stormtrack_data Signaled (SIGSEGV) 02/06/2013 16:02:02
00005 n3m5-8 /stormtrack_data Exit (1) 02/06/2013 16:02:02
00006 n3m5-8 /stormtrack_data Signaled (SIGSEGV) 02/06/2013 16:02:02
00007 n3m5-8 /stormtrack_data Signaled (SIGSEGV) 02/06/2013 16:02:02
00008 n3m5-9 /stormtrack_data Exit (1) 02/06/2013 16:02:02
00009 n3m5-9 /stormtrack_data Signaled (SIGSEGV) 02/06/2013 16:02:02
00010 n3m5-9 /stormtrack_data Signaled (SIGSEGV) 02/06/2013 16:02:02
00011 n3m5-9 /stormtrack_data Exit (1) 02/06/2013 16:02:02
00012 n3m5-9 /stormtrack_data Signaled (SIGSEGV) 02/06/2013 16:02:02
00013 n3m5-9 /stormtrack_data Signaled (SIGSEGV) 02/06/2013 16:02:02
00014 n3m5-9 /stormtrack_data Signaled (SIGSEGV) 02/06/2013 16:02:02
00015 n3m5-9 /stormtrack_data Signaled (SIGSEGV) 02/06/2013 16:02:02
==========================================================================
What is strange is that when use the DEBUG option in the env_build.xml,
the model can run successfully, at least for the first 5 days.
Is this problem from the compiler?
Someone told me maybe it is the optimize flag cause the code fail which is shutdown in the DEBUG compiling so the model can run with DEBUG.
I tend to use Intel compiler ,but the model fail at build stage.(Another post)
Thank you for any suggestion.
netcdf 3.6.3
The runtime error occurs at Initialize lnd component.
The cpl log file stop at :(seq_mct_drv) : Initialize lnd component
The lnd log file stop at :
Attempting to read surface boundary data .....
(GETFIL): attempting to find local file surfdata_4x5_simyr2000_c090928.nc
(GETFIL): using
/stormtrack_data/xlong/cesmdata/input/lnd/clm2/surfdata/surfdata_4x5_simyr2000_c090928.nc
The ccsm log file gives some error information:
==========================================
16 pes participating in computation for CLM
-----------------------------------
NODE# NAME
( 0) n3m5-9
( 1) n3m5-9
( 2) n3m5-9
( 3) n3m5-9
( 4) n3m5-9
( 5) n3m5-9
( 6) n3m5-9
( 7) n3m5-9
( 8) n3m5-8
( 9) n3m5-8
( 10) n3m5-8
( 11) n3m5-8
( 12) n3m5-8
( 13) n3m5-8
( 14) n3m5-8
( 15) n3m5-8
Opened existing file
/stormtrack_data/xlong/cesmdata/input/lnd/clm2/griddata/griddata_4x5_060404.nc
31
Opened existing file
/stormtrack_data/xlong/cesmdata/input/lnd/clm2/griddata/fracdata_4x5_gx3v7_c091231.nc
32
Opened existing file
/stormtrack_data/xlong/cesmdata/input/lnd/clm2/surfdata/surfdata_4x5_simyr2000_c090928.nc
31
proc= 1 beg atmcell = 93 end atmcell = 184
total atmcells per proc = 92
proc= 1 atm ngseg = 417 atm nlseg = 26
proc= 1 nclumps = 1
proc= 7 beg atmcell = 643 end atmcell = 734
total atmcells per proc = 92
proc= 7 atm ngseg = 417 atm nlseg = 26
proc= 7 nclumps = 1
proc= 14 beg atmcell = 1285 end atmcell = 1376
total atmcells per proc = 92
proc= 14 atm ngseg = 417 atm nlseg = 24
proc= 14 nclumps = 1
proc= 15 beg atmcell = 1377 end atmcell = 1466
total atmcells per proc = 90
proc= 15 atm ngseg = 417 atm nlseg = 26
proc= 15 nclumps = 1
Opened existing file
/stormtrack_data/xlong/cesmdata/input/lnd/clm2/griddata/griddata_4x5_060404.nc
31
Opened existing file
/stormtrack_data/xlong/cesmdata/input/lnd/clm2/griddata/fracdata_4x5_gx3v7_c091231.nc
31
Opened existing file
/stormtrack_data/xlong/cesmdata/input/lnd/clm2/surfdata/surfdata_4x5_simyr2000_c090928.nc
31
Opened existing file
/stormtrack_data/xlong/cesmdata/input/lnd/clm2/surfdata/surfdata_4x5_simyr2000_c090928.nc
31
Opened existing file
/stormtrack_data/xlong/cesmdata/input/lnd/clm2/pftdata/pft-physiology.c110425.nc
31
Opened existing file
/stormtrack_data/xlong/cesmdata/input/lnd/clm2/surfdata/surfdata_4x5_simyr2000_c090928.nc
31
Fatal error in MPI_Allreduce: Other MPI error, error stack:
MPI_Allreduce(773)................: MPI_Allreduce(sbuf=0x143cae0, rbuf=0x143b420, count=1, MPI_LOGICAL, MPI_LOR, comm=0x84000002) failed
MPIR_Allreduce(289)...............:
MPIC_Sendrecv(164)................:
MPIC_Wait(519)....................:
MPIDI_CH3I_Progress(165)..........:
MPID_nem_mpich2_blocking_recv(895):
MPID_nem_tcp_connpoll(1725).......:
state_commrdy_handler(1555).......:
MPID_nem_tcp_recv_handler(1445)...: socket closed
Fatal error in MPI_Allreduce: Other MPI error, error stack:
MPI_Allreduce(773)................: MPI_Allreduce(sbuf=0x143cae0, rbuf=0x143b420, count=1, MPI_LOGICAL, MPI_LOR, comm=0x84000002) failed
MPIR_Allreduce(289)...............:
MPIC_Sendrecv(164)................:
MPIC_Wait(519)....................:
MPIDI_CH3I_Progress(165)..........:
MPID_nem_mpich2_blocking_recv(895):
MPID_nem_tcp_connpoll(1725).......:
state_commrdy_handler(1555).......:
MPID_nem_tcp_recv_handler(1445)...: socket closed
Fatal error in MPI_Allreduce: Other MPI error, error stack:
MPI_Allreduce(773)................: MPI_Allreduce(sbuf=0x143cae0, rbuf=0x143b420, count=1, MPI_LOGICAL, MPI_LOR, comm=0x84000002) failed
MPIR_Allreduce(289)...............:
MPIC_Sendrecv(164)................:
MPIC_Wait(519)....................:
MPIDI_CH3I_Progress(165)..........:
MPID_nem_mpich2_blocking_recv(895):
MPID_nem_tcp_connpoll(1725).......:
state_commrdy_handler(1555).......:
MPID_nem_tcp_recv_handler(1445)...: socket closed
Fatal error in MPI_Allreduce: Other MPI error, error stack:
MPI_Allreduce(773)................: MPI_Allreduce(sbuf=0x143cae0, rbuf=0x143b420, count=1, MPI_LOGICAL, MPI_LOR, comm=0x84000002) failed
MPIR_Allreduce(289)...............:
MPIC_Sendrecv(164)................:
MPIC_Wait(519)....................:
MPIDI_CH3I_Progress(165)..........:
MPID_nem_mpich2_blocking_recv(895):
MPID_nem_tcp_connpoll(1725).......:
state_commrdy_handler(1555).......:
MPID_nem_tcp_recv_handler(1445)...: socket closed
APPLICATION TERMINATED WITH THE EXIT STRING: Killed (signal 9)
Job mpichhydra_wrapper /usr/local/lsf/7.0/linux2.6-glibc2.3-x86_64/bin/TaskStarter -p n3m5-9:41620 -c /usr/local/lsf/conf -s /usr/local/lsf/7.0/linux2.6-glibc2.3-x86_64/etc -a LINUX86 /stormtrack_data/xlong/b0206.5/run/ccsm.exe
TID HOST_NAME COMMAND_LINE STATUS TERMINATION_TIME
===== ========== ================ ======================= ===================
00000 n3m5-8 /stormtrack_data Signaled (SIGSEGV) 02/06/2013 16:02:02
00001 n3m5-8 /stormtrack_data Exit (1) 02/06/2013 16:02:02
00002 n3m5-8 /stormtrack_data Signaled (SIGSEGV) 02/06/2013 16:02:02
00003 n3m5-8 /stormtrack_data Signaled (SIGSEGV) 02/06/2013 16:02:02
00004 n3m5-8 /stormtrack_data Signaled (SIGSEGV) 02/06/2013 16:02:02
00005 n3m5-8 /stormtrack_data Exit (1) 02/06/2013 16:02:02
00006 n3m5-8 /stormtrack_data Signaled (SIGSEGV) 02/06/2013 16:02:02
00007 n3m5-8 /stormtrack_data Signaled (SIGSEGV) 02/06/2013 16:02:02
00008 n3m5-9 /stormtrack_data Exit (1) 02/06/2013 16:02:02
00009 n3m5-9 /stormtrack_data Signaled (SIGSEGV) 02/06/2013 16:02:02
00010 n3m5-9 /stormtrack_data Signaled (SIGSEGV) 02/06/2013 16:02:02
00011 n3m5-9 /stormtrack_data Exit (1) 02/06/2013 16:02:02
00012 n3m5-9 /stormtrack_data Signaled (SIGSEGV) 02/06/2013 16:02:02
00013 n3m5-9 /stormtrack_data Signaled (SIGSEGV) 02/06/2013 16:02:02
00014 n3m5-9 /stormtrack_data Signaled (SIGSEGV) 02/06/2013 16:02:02
00015 n3m5-9 /stormtrack_data Signaled (SIGSEGV) 02/06/2013 16:02:02
==========================================================================
What is strange is that when use the DEBUG option in the env_build.xml,
the model can run successfully, at least for the first 5 days.
Is this problem from the compiler?
Someone told me maybe it is the optimize flag cause the code fail which is shutdown in the DEBUG compiling so the model can run with DEBUG.
I tend to use Intel compiler ,but the model fail at build stage.(Another post)
Thank you for any suggestion.