crowe1@unl_edu
New Member
I'm trying to run a paleoclimate simulation using CCSM3 and am encountering a segmentation fault and core dump which is caused by CAM. I have already run a one-year simulation with current day (default) datasets and the same model configuration as what I'm running now (in fact, the build and run scripts differ only in the $CASE and $CASEROOT).
Here are the errors grep'd from poe.stderr
ERROR: 0031-250 task 17: Segmentation fault
ERROR: 0031-250 task 7: Terminated
ERROR: 0031-250 task 6: Terminated
ERROR: 0031-250 task 12: Terminated
ERROR: 0031-250 task 13: Terminated
ERROR: 0031-250 task 4: Terminated
ERROR: 0031-250 task 5: Terminated
ERROR: 0031-250 task 3: Terminated
ERROR: 0031-250 task 2: Terminated
ERROR: 0031-250 task 14: Terminated
ERROR: 0031-250 task 15: Terminated
ERROR: 0031-250 task 1: Terminated
ERROR: 0031-250 task 11: Terminated
ERROR: 0031-250 task 0: Terminated
ERROR: 0031-250 task 8: Terminated
ERROR: 0031-250 task 9: Terminated
ERROR: 0031-250 task 10: Terminated
ERROR: 0031-250 task 22: Segmentation fault
ERROR: 0031-250 task 20: Segmentation fault
ERROR: 0031-250 task 18: Segmentation fault
ERROR: 0031-250 task 26: Segmentation fault
ERROR: 0031-250 task 24: Segmentation fault
ERROR: 0031-250 task 27: Segmentation fault
ERROR: 0031-250 task 16: Segmentation fault
ERROR: 0031-250 task 19: Segmentation fault
ERROR: 0031-250 task 23: Segmentation fault
ERROR: 0031-250 task 21: Segmentation fault
ERROR: 0031-250 task 25: Segmentation fault
Tasks 16-27 are mapped to CAM and there are coredump directories in $EXEROOT/atm, so that would seem to be where the problem lies.
I've used the same initial data sets to run standalone CAM without an error, so I can't figure out how changing the input data is the problem. The segmentation fault seems to occur just as the integration is about to get going (i.e., after initialization of all models is completed), again indicating that all the necessary input files were found and read successfully. For example, here's the end of the cpl and atm logs (ocn, ice and lnd similarly indicated they had made it through initialization).
cpl.log.050715-013524
(main) process IC data
(main) -------------------------------------------------------------------------
(cpl_map_bun) WARNING: bundle aoflux_o has accum count = 0
(main) -------------------------------------------------------------------------
(main) optional atm initialization: send albedos & recv new solar?
(main) -------------------------------------------------------------------------
(main) * atm component requests recalculation of initial solar
(main) send albedos to atm, recv new atm IC's
(main) -------------------------------------------------------------------------
(main) create data as necessary for 1st iteration of main event loop
(main) -------------------------------------------------------------------------
(main) -------------------------------------------------------------------------
(main) start of main integration loop
(main) -------------------------------------------------------------------------
(tStamp_write) cpl model date 0001-01-01 00000s wall clock 2005-07-15 01:39:55 avg dt 0s dt 0s
(cpl_map_npFixNew3) compute bilinear weights & indicies for NP region.
(cpl_bundle_copy) WARNING: bundle aoflux_o has accum count = 0
(flux_atmOcn) FYI: this routine is not threaded
D1: In pm_child_sig_handler, signal=15, task=0
atm.log.050715-013524
(CCSMINI): get orbital parameters from coupler
(CCSM_MSG_GET_ORB): recd d->a initial ibuf
(CCSM_MSG_GET_ORB): eccen: 0.167077197992806584E-01
(CCSM_MSG_GET_ORB): obliqr: 0.409123824657880164
(CCSM_MSG_GET_ORB): lambm0: -0.325036358785193782E-01
(CCSM_MSG_GET_ORB): mvelpp: 4.93446790898673182
(CCSM_MSG_GET_ORB): recd d->a initial real buf
(CCSMINI): is_first_step is true; get albedos from coupler
(CCSM_MSG_GETALB) recd d->a surface state
(CCSMINI): CCSM initialization complete!
D1: In pm_child_sig_handler, signal=11, task=16
Does anyone have any ideas about where I should start looking for problems?
Here are the errors grep'd from poe.stderr
ERROR: 0031-250 task 17: Segmentation fault
ERROR: 0031-250 task 7: Terminated
ERROR: 0031-250 task 6: Terminated
ERROR: 0031-250 task 12: Terminated
ERROR: 0031-250 task 13: Terminated
ERROR: 0031-250 task 4: Terminated
ERROR: 0031-250 task 5: Terminated
ERROR: 0031-250 task 3: Terminated
ERROR: 0031-250 task 2: Terminated
ERROR: 0031-250 task 14: Terminated
ERROR: 0031-250 task 15: Terminated
ERROR: 0031-250 task 1: Terminated
ERROR: 0031-250 task 11: Terminated
ERROR: 0031-250 task 0: Terminated
ERROR: 0031-250 task 8: Terminated
ERROR: 0031-250 task 9: Terminated
ERROR: 0031-250 task 10: Terminated
ERROR: 0031-250 task 22: Segmentation fault
ERROR: 0031-250 task 20: Segmentation fault
ERROR: 0031-250 task 18: Segmentation fault
ERROR: 0031-250 task 26: Segmentation fault
ERROR: 0031-250 task 24: Segmentation fault
ERROR: 0031-250 task 27: Segmentation fault
ERROR: 0031-250 task 16: Segmentation fault
ERROR: 0031-250 task 19: Segmentation fault
ERROR: 0031-250 task 23: Segmentation fault
ERROR: 0031-250 task 21: Segmentation fault
ERROR: 0031-250 task 25: Segmentation fault
Tasks 16-27 are mapped to CAM and there are coredump directories in $EXEROOT/atm, so that would seem to be where the problem lies.
I've used the same initial data sets to run standalone CAM without an error, so I can't figure out how changing the input data is the problem. The segmentation fault seems to occur just as the integration is about to get going (i.e., after initialization of all models is completed), again indicating that all the necessary input files were found and read successfully. For example, here's the end of the cpl and atm logs (ocn, ice and lnd similarly indicated they had made it through initialization).
cpl.log.050715-013524
(main) process IC data
(main) -------------------------------------------------------------------------
(cpl_map_bun) WARNING: bundle aoflux_o has accum count = 0
(main) -------------------------------------------------------------------------
(main) optional atm initialization: send albedos & recv new solar?
(main) -------------------------------------------------------------------------
(main) * atm component requests recalculation of initial solar
(main) send albedos to atm, recv new atm IC's
(main) -------------------------------------------------------------------------
(main) create data as necessary for 1st iteration of main event loop
(main) -------------------------------------------------------------------------
(main) -------------------------------------------------------------------------
(main) start of main integration loop
(main) -------------------------------------------------------------------------
(tStamp_write) cpl model date 0001-01-01 00000s wall clock 2005-07-15 01:39:55 avg dt 0s dt 0s
(cpl_map_npFixNew3) compute bilinear weights & indicies for NP region.
(cpl_bundle_copy) WARNING: bundle aoflux_o has accum count = 0
(flux_atmOcn) FYI: this routine is not threaded
D1: In pm_child_sig_handler, signal=15, task=0
atm.log.050715-013524
(CCSMINI): get orbital parameters from coupler
(CCSM_MSG_GET_ORB): recd d->a initial ibuf
(CCSM_MSG_GET_ORB): eccen: 0.167077197992806584E-01
(CCSM_MSG_GET_ORB): obliqr: 0.409123824657880164
(CCSM_MSG_GET_ORB): lambm0: -0.325036358785193782E-01
(CCSM_MSG_GET_ORB): mvelpp: 4.93446790898673182
(CCSM_MSG_GET_ORB): recd d->a initial real buf
(CCSMINI): is_first_step is true; get albedos from coupler
(CCSM_MSG_GETALB) recd d->a surface state
(CCSMINI): CCSM initialization complete!
D1: In pm_child_sig_handler, signal=11, task=16
Does anyone have any ideas about where I should start looking for problems?