thompscs@purdue_edu
New Member
While attempting to upgrade from CESM 1.0 to CESM 1.0.3, I have run into problems with running the CAM model. The model appears to build and run correctly through all of the data points in my test cases but causes a segfault at the end while writing history/restart files.
The individual log files for each model have no error output nor does the coupler log file. The overall ccsm.exe log files for my test cases are the only ones which give a clue as to what is happening. It looks like for each CAM proc that was running there is a block like this showing a segfault inside cam_history.F90:
This is being built on an x86_64 environment running RHEL5 with the Intel 11.1.038 compilers and running with mpich2 1.2 & netcdf 4.0.1. Also, my execution nodes are reporting that they have no limit on stack or memory for the jobs I am submitting.
Has anyone else experienced something like this with the CAM model or had a similar issue when porting their cluster's CESM 1.0 to 1.0.3?
The individual log files for each model have no error output nor does the coupler log file. The overall ccsm.exe log files for my test cases are the only ones which give a clue as to what is happening. It looks like for each CAM proc that was running there is a block like this showing a segfault inside cam_history.F90:
Code:
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
ccsm.exe 000000000047A71B cam_history_mp_h_ 3229 cam_history.F90
ccsm.exe 0000000000477ABC cam_history_mp_ws 4535 cam_history.F90
ccsm.exe 0000000000475A64 cam_history_mp_wr 866 cam_history.F90
ccsm.exe 00000000004A2002 cam_restart_mp_ca 251 cam_restart.F90
ccsm.exe 000000000046D8DD cam_comp_mp_cam_r 390 cam_comp.F90
ccsm.exe 000000000045F879 atm_comp_mct_mp_a 536 atm_comp_mct.F90
ccsm.exe 0000000000406E15 ccsm_comp_mod_mp_ 2166 ccsm_comp_mod.F90
ccsm.exe 000000000041732D MAIN__ 91 ccsm_driver.F90
ccsm.exe 000000000040557C Unknown Unknown Unknown
libc.so.6 000000396C21D994 Unknown Unknown Unknown
ccsm.exe 0000000000405489 Unknown Unknown Unknown
This is being built on an x86_64 environment running RHEL5 with the Intel 11.1.038 compilers and running with mpich2 1.2 & netcdf 4.0.1. Also, my execution nodes are reporting that they have no limit on stack or memory for the jobs I am submitting.
Code:
cputime unlimited
filesize unlimited
datasize unlimited
stacksize unlimited
coredumpsize 0 kbytes
memoryuse unlimited
vmemoryuse unlimited
descriptors 20480
memorylocked unlimited
maxproc unlimited
Has anyone else experienced something like this with the CAM model or had a similar issue when porting their cluster's CESM 1.0 to 1.0.3?