Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

A

scs_wy@yahoo_cn

New Member
Hi everyone
I run CCSM3 a few times and I replace my cluster recently.There are some strange problem happen......
I can run T85_gx1v3,T42_gx1v3 and T31_gx3v5 smoothly.But I can't run T42_gx3v5.The output log file show that it can be run to 00010101 and it stop.

---------------------------------------------------------------------------------------------
(tStamp_write) cpl model date 0001-01-01 00000s wall clock 2009-02-04 20:49:36 avg dt 0s dt
0s
(cpl_map_npFixNew3) compute bilinear weights & indicies for NP region.
(cpl_bundle_copy) WARNING: bundle aoflux_o has accum count = 0
(flux_atmOcn) FYI: this routine is not threaded
print_memusage iam 0 stepon after dynpkg. -1 in the next line means unavailable
print_memusage: size, rss, share, text, datastack= 63097 33892 1577 1175 0
print_memusage iam 1 stepon after dynpkg. -1 in the next line means unavailable
print_memusage: size, rss, share, text, datastack= 54770 25481 1524 1175 0
print_memusage iam 12 stepon after dynpkg. -1 in the next line means unavailable
print_memusage: size, rss, share, text, datastack= 54197 24577 1539 1175 0
print_memusage iam 2 stepon after dynpkg. -1 in the next line means unavailable
print_memusage: size, rss, share, text, datastack= 53863 24586 1541 1175 0
print_memusage iam 3 stepon after dynpkg. -1 in the next line means unavailable
print_memusage: size, rss, share, text, datastack= 54472 25090 1541 1175 0
print_memusage iam 4 stepon after dynpkg. -1 in the next line means unavailable
print_memusage: size, rss, share, text, datastack= 53774 24380 1541 1175 0
print_memusage iam 5 stepon after dynpkg. -1 in the next line means unavailable
print_memusage: size, rss, share, text, datastack= 56325 24827 1541 1175 0
print_memusage iam 6 stepon after dynpkg. -1 in the next line means unavailable
print_memusage: size, rss, share, text, datastack= 53466 24019 1541 1175 0
print_memusage iam 7 stepon after dynpkg. -1 in the next line means unavailable
print_memusage: size, rss, share, text, datastack= 53160 23663 1541 1175 0
print_memusage iam 8 stepon after dynpkg. -1 in the next line means unavailable
print_memusage: size, rss, share, text, datastack= 54064 24494 1541 1175 0
print_memusage iam 9 stepon after dynpkg. -1 in the next line means unavailable
print_memusage: size, rss, share, text, datastack= 53600 24039 1541 1175 0
print_memusage iam 10 stepon after dynpkg. -1 in the next line means unavailable
print_memusage: size, rss, share, text, datastack= 54198 24656 1541 1175 0
print_memusage iam 11 stepon after dynpkg. -1 in the next line means unavailable
print_memusage: size, rss, share, text, datastack= 54146 24552 1539 1175 0
print_memusage iam 13 stepon after dynpkg. -1 in the next line means unavailable
print_memusage: size, rss, share, text, datastack= 53497 23715 1539 1175 0
print_memusage iam 14 stepon after dynpkg. -1 in the next line means unavailable
print_memusage: size, rss, share, text, datastack= 54205 23433 1500 1175 0
print_memusage iam 15 stepon after dynpkg. -1 in the next line means unavailable
print_memusage: size, rss, share, text, datastack= 54072 23269 1500 1175 0
[node21:06615] MPI_ABORT invoked on rank 40 in communicator MPI_COMM_WORLD with errorcode 1
[node20:26437] *** Process received signal ***
[node20:26437] Signal: Segmentation fault (11)
[node20:26437] Signal code: Address not mapped (1)
[node20:26437] Failing at address: 0x16819fa8
[node20:26435] *** Process received signal ***
[node20:26435] Signal: Segmentation fault (11)
[node20:26435] Signal code: Address not mapped (1)
[node20:26435] Failing at address: 0x1684e528
[node20:26438] *** Process received signal ***
[node20:26436] *** Process received signal ***
[node20:26438] Signal: Segmentation fault (11)
[node20:26438] Signal code: Address not mapped (1)
[node20:26438] Failing at address: 0x1680ff68
[node20:26436] Signal: Segmentation fault (11)
[node20:26436] Signal code: Address not mapped (1)
[node20:26436] Failing at address: 0x1621b068
[node20:26437] [ 0] /lib64/libpthread.so.0 [0x2b65704a4c00]
[node20:26437] [ 1] /dcfs2/users/wy/case_0204_35/exe/case_0204_35/all/cam(sphdep_+0xc14) [0x6e7f14]
[node20:26437] *** End of error message ***
[node20:26434] *** Process received signal ***
[node20:26439] *** Process received signal ***
[node20:26434] Signal: Segmentation fault (11)
[node20:26434] Signal code: Address not mapped (1)
[node20:26434] Failing at address: 0x16844528
[node20:26439] Signal: Segmentation fault (11)
[node20:26439] Signal code: Address not mapped (1)
[node20:26439] Failing at address: 0x1621af68
[node20:26435] [ 0] /lib64/libpthread.so.0 [0x2b70d554bc00]
[node20:26435] [ 1] /dcfs2/users/wy/case_0204_35/exe/case_0204_35/all/cam(sphdep_+0xc14) [0x6e7f14]
[node20:26435] *** End of error message ***
[node20:26438] [ 0] /lib64/libpthread.so.0 [0x2b0c9a68bc00]
[node20:26438] [ 1] /dcfs2/users/wy/case_0204_35/exe/case_0204_35/all/cam(sphdep_+0xc14) [0x6e7f14]
[node20:26438] *** End of error message ***
[node20:26436] [ 0] /lib64/libpthread.so.0 [0x2b3214a43c00]
[node20:26436] [ 1] /dcfs2/users/wy/case_0204_35/exe/case_0204_35/all/cam(sphdep_+0xc14) [0x6e7f14]
[node20:26436] *** End of error message ***
[node20:26434] [ 0] /lib64/libpthread.so.0 [0x2b70d554bc00]
[node20:26434] [ 1] /dcfs2/users/wy/case_0204_35/exe/case_0204_35/all/cam(sphdep_+0xc14) [0x6e7f14]
[node20:26434] *** End of error message ***
[node20:26439] [ 0] /lib64/libpthread.so.0 [0x2b8b9c06fc00]
[node20:26439] [ 1] /dcfs2/users/wy/case_0204_35/exe/case_0204_35/all/cam(sphdep_+0xc14) [0x6e7f14]
[node20:26439] *** End of error message ***
[node20:26440] *** Process received signal ***
[node20:26440] Signal: Segmentation fault (11)
[node20:26440] Signal code: Address not mapped (1)
[node20:26440] Failing at address: 0x16842ad8
[node20:26440] [ 0] /lib64/libpthread.so.0 [0x2b0f16cb7c00]
[node20:26440] [ 1] /dcfs2/users/wy/case_0204_35/exe/case_0204_35/all/cam(sphdep_+0xc14) [0x6e7f14]
[node20:26440] *** End of error message ***
[node21:06616] MPI_ABORT invoked on rank 41 in communicator MPI_COMM_WORLD with errorcode 1
[node20:26428] MPI_ABORT invoked on rank 43 in communicator MPI_COMM_WORLD with errorcode 1
[node20:26429] MPI_ABORT invoked on rank 44 in communicator MPI_COMM_WORLD with errorcode 1
[node20:26430] MPI_ABORT invoked on rank 45 in communicator MPI_COMM_WORLD with errorcode 1
[node20:26431] MPI_ABORT invoked on rank 46 in communicator MPI_COMM_WORLD with errorcode 1
[node20:26432] MPI_ABORT invoked on rank 47 in communicator MPI_COMM_WORLD with errorcode 1
[node20:26433] MPI_ABORT invoked on rank 48 in communicator MPI_COMM_WORLD with errorcode 1
[node23:25877] [0,0,0] ORTE_ERROR_LOG: Timeout in file ../../../../orte/mca/pls/base/pls_base_orted_cmds
.c at line 275
[node23:25877] [0,0,0] ORTE_ERROR_LOG: Timeout in file ../../../../../orte/mca/pls/tm/pls_tm_module.c at
line 572
[node23:25877] [0,0,0] ORTE_ERROR_LOG: Timeout in file ../../../../../orte/mca/errmgr/hnp/errmgr_hnp.c a
t line 90
mpirun noticed that job rank 0 with PID 25879 on node node23 exited on signal 15 (Terminated).
[node23:25877] [0,0,0] ORTE_ERROR_LOG: Timeout in file ../../../../orte/mca/pls/base/pls_base_orted_cmds
.c at line 188
[node23:25877] [0,0,0] ORTE_ERROR_LOG: Timeout in file ../../../../../orte/mca/pls/tm/pls_tm_module.c at
line 603
--------------------------------------------------------------------------
mpirun was unable to cleanly terminate the daemons for this job. Returned value Timeout instead of ORTE_
SUCCESS.
--------------------------------------------------------------------------
34 additional processes aborted (not shown)
[node20:26426] OOB: Connection to HNP lost
[node21:06602] OOB: Connection to HNP lost
Wed Feb 4 20:49:57 CST 2009 -- CSM EXECUTION HAS FINISHED
Model did not complete - see cpl.log.090204-133502

-----------------------------------------------------------------------------------
It look likes Segmentation fault.Any one can give me some advice???
Thanks!!!
 
Top