m_shivi84@gmail_com
New Member
Hi,
I am trying to run the TER.01a. T31_gx3v5.B.generic_linux test of the CCSM3 Beta version on an Intel (Linux) cluster, with PGI 5.2 and MPICH-1.2.6
I have used NTASKS_CPL = 2.
After the start of the main integration loop, the run terminates with a SIGSEGV 11.
:
(main) -------------------------------------------------------------------------
(main) start of main integration loop
(main) -------------------------------------------------------------------------
(tStamp_write) cpl model date 0001-01-01 00000s wall clock 2006-07-17 22:28:41 avg dt 0s dt 0s
(cpl_map_npFixNew3) compute bilinear weights & indicies for NP region.
(cpl_bundle_copy) WARNING: bundle aoflux_o has accum count = 0
(flux_atmOcn) FYI: this routine is not threaded
print_memusage iam 0 stepon after dynpkg. -1 in the next line means unavailable
print_memusage: size, rss, share, text, datastack= 62193 57314 8440 8040 0
(tStamp_write) cpl model date 0001-01-01 03600s wall clock 2006-07-17 22:28:58 avg dt 17s dt 17s
p1_29512: p4_error: interrupt SIGSEGV: 11
Broken pipe
P4 procgroup file is mpirun.pgfile.
Mon Jul 17 21:59:39 IST 2006 -- CSM EXECUTION HAS FINISHED
Model did not complete - see cpl.log.060717-215431
From the coupler log:
:
(main) -------------------------------------------------------------------------
(main) start of main integration loop
(main) -------------------------------------------------------------------------
(tStamp_write) cpl model date 0001-01-01 00000s wall clock 2006-07-17 21:59:21 avg dt 0s dt 0s
comm_diag xxx sorr 1 1.2593097356701725000E+16 send lnd Sa_z
comm_diag xxx sorr 2-4.2681094806087188000E+13 send lnd Sa_u
comm_diag xxx sorr 3 2.0554894552372100000E+14 send lnd Sa_v
:
comm_diag xxx sorr 18 3.4011649223548775000E+15 send lnd Faxa_swvdf
(cpl_map_npFixNew3) compute bilinear weights & indicies for NP region.
comm_diag xxx sorr 1 1.0481525588612560000E+17 send ice So_t
:
comm_diag xxx sorr 20 1.5556313824627987000E+10 send ice Faxc_rain
comm_diag xxx sorr 21 1.9856500405355048000E+08 send ice Faxc_snow
(cpl_bundle_copy) WARNING: bundle aoflux_o has accum count = 0
(flux_atmOcn) FYI: this routine is not threaded
comm_diag xxx sorr 1 1.3546253911681220000E+16 recv ice Si_t
:
comm_diag xxx sorr 22-1.9481521504421387000E+11 recv ice Fioi_tauy
(frac_set) WARNING: global max ifrac = 1.000000000000000
comm_diag xxx sorr 1 5.5564148202253860000E+16 recv lnd Sl_t
:
comm_diag xxx sorr 15-1.4296651656678892000E+16 recv lnd Fall_swnet
comm_diag xxx sorr 1 0.0000000000000000000E+00 recv lnd Forr_roff
comm_diag xxx sorr 1 1.4599170314316630000E+17 send atm Sx_tref
:
comm_diag xxx sorr 16-1.9928358794721107000E+17 send atm Faxx_lwup
comm_diag xxx sorr 17-1.9187264102173386000E+10 send atm Faxx_evap
comm_diag xxx sorr 1 3.2447054044955785000E+16 recv atm Sa_z
comm_diag xxx sorr 2-1.0231794316330169000E+14 recv atm Sa_u
:
comm_diag xxx sorr 18 1.9169915865209935000E+16 recv atm Faxa_swvdf
comm_diag xxx sorr 19 9.5742463005565320000E+16 recv atm Faxa_swnet
(tStamp_write) cpl model date 0001-01-01 03600s wall clock 2006-07-17 21:59:38 avg dt 17s dt 17s
comm_diag xxx sorr 1 1.2565853834216740000E+16 send lnd Sa_z
comm_diag xxx sorr 2-5.3130266987493594000E+13 send lnd Sa_u
:
comm_diag xxx sorr 17 2.5987897925793370000E+15 send lnd Faxa_swndf
comm_diag xxx sorr 18 4.6425651371298120000E+15 send lnd Faxa_swvdf
comm_diag xxx sorr 1 1.0481525588612560000E+17 send ice So_t
comm_diag xxx sorr 2 1.2473386243449007000E+16 send ice So_s
:
comm_diag xxx sorr 20 1.6756996660249876000E+10 send ice Faxc_rain
comm_diag xxx sorr 21 2.5689317946379995000E+08 send ice Faxc_snow
Please help. Thank you,
I am trying to run the TER.01a. T31_gx3v5.B.generic_linux test of the CCSM3 Beta version on an Intel (Linux) cluster, with PGI 5.2 and MPICH-1.2.6
I have used NTASKS_CPL = 2.
After the start of the main integration loop, the run terminates with a SIGSEGV 11.
:
(main) -------------------------------------------------------------------------
(main) start of main integration loop
(main) -------------------------------------------------------------------------
(tStamp_write) cpl model date 0001-01-01 00000s wall clock 2006-07-17 22:28:41 avg dt 0s dt 0s
(cpl_map_npFixNew3) compute bilinear weights & indicies for NP region.
(cpl_bundle_copy) WARNING: bundle aoflux_o has accum count = 0
(flux_atmOcn) FYI: this routine is not threaded
print_memusage iam 0 stepon after dynpkg. -1 in the next line means unavailable
print_memusage: size, rss, share, text, datastack= 62193 57314 8440 8040 0
(tStamp_write) cpl model date 0001-01-01 03600s wall clock 2006-07-17 22:28:58 avg dt 17s dt 17s
p1_29512: p4_error: interrupt SIGSEGV: 11
Broken pipe
P4 procgroup file is mpirun.pgfile.
Mon Jul 17 21:59:39 IST 2006 -- CSM EXECUTION HAS FINISHED
Model did not complete - see cpl.log.060717-215431
From the coupler log:
:
(main) -------------------------------------------------------------------------
(main) start of main integration loop
(main) -------------------------------------------------------------------------
(tStamp_write) cpl model date 0001-01-01 00000s wall clock 2006-07-17 21:59:21 avg dt 0s dt 0s
comm_diag xxx sorr 1 1.2593097356701725000E+16 send lnd Sa_z
comm_diag xxx sorr 2-4.2681094806087188000E+13 send lnd Sa_u
comm_diag xxx sorr 3 2.0554894552372100000E+14 send lnd Sa_v
:
comm_diag xxx sorr 18 3.4011649223548775000E+15 send lnd Faxa_swvdf
(cpl_map_npFixNew3) compute bilinear weights & indicies for NP region.
comm_diag xxx sorr 1 1.0481525588612560000E+17 send ice So_t
:
comm_diag xxx sorr 20 1.5556313824627987000E+10 send ice Faxc_rain
comm_diag xxx sorr 21 1.9856500405355048000E+08 send ice Faxc_snow
(cpl_bundle_copy) WARNING: bundle aoflux_o has accum count = 0
(flux_atmOcn) FYI: this routine is not threaded
comm_diag xxx sorr 1 1.3546253911681220000E+16 recv ice Si_t
:
comm_diag xxx sorr 22-1.9481521504421387000E+11 recv ice Fioi_tauy
(frac_set) WARNING: global max ifrac = 1.000000000000000
comm_diag xxx sorr 1 5.5564148202253860000E+16 recv lnd Sl_t
:
comm_diag xxx sorr 15-1.4296651656678892000E+16 recv lnd Fall_swnet
comm_diag xxx sorr 1 0.0000000000000000000E+00 recv lnd Forr_roff
comm_diag xxx sorr 1 1.4599170314316630000E+17 send atm Sx_tref
:
comm_diag xxx sorr 16-1.9928358794721107000E+17 send atm Faxx_lwup
comm_diag xxx sorr 17-1.9187264102173386000E+10 send atm Faxx_evap
comm_diag xxx sorr 1 3.2447054044955785000E+16 recv atm Sa_z
comm_diag xxx sorr 2-1.0231794316330169000E+14 recv atm Sa_u
:
comm_diag xxx sorr 18 1.9169915865209935000E+16 recv atm Faxa_swvdf
comm_diag xxx sorr 19 9.5742463005565320000E+16 recv atm Faxa_swnet
(tStamp_write) cpl model date 0001-01-01 03600s wall clock 2006-07-17 21:59:38 avg dt 17s dt 17s
comm_diag xxx sorr 1 1.2565853834216740000E+16 send lnd Sa_z
comm_diag xxx sorr 2-5.3130266987493594000E+13 send lnd Sa_u
:
comm_diag xxx sorr 17 2.5987897925793370000E+15 send lnd Faxa_swndf
comm_diag xxx sorr 18 4.6425651371298120000E+15 send lnd Faxa_swvdf
comm_diag xxx sorr 1 1.0481525588612560000E+17 send ice So_t
comm_diag xxx sorr 2 1.2473386243449007000E+16 send ice So_s
:
comm_diag xxx sorr 20 1.6756996660249876000E+10 send ice Faxc_rain
comm_diag xxx sorr 21 2.5689317946379995000E+08 send ice Faxc_snow
Please help. Thank you,