Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Segfault during CAM history/restart writing

While attempting to upgrade from CESM 1.0 to CESM 1.0.3, I have run into problems with running the CAM model. The model appears to build and run correctly through all of the data points in my test cases but causes a segfault at the end while writing history/restart files.

The individual log files for each model have no error output nor does the coupler log file. The overall ccsm.exe log files for my test cases are the only ones which give a clue as to what is happening. It looks like for each CAM proc that was running there is a block like this showing a segfault inside cam_history.F90:



Code:
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source
ccsm.exe           000000000047A71B  cam_history_mp_h_        3229  cam_history.F90
ccsm.exe           0000000000477ABC  cam_history_mp_ws        4535  cam_history.F90
ccsm.exe           0000000000475A64  cam_history_mp_wr         866  cam_history.F90
ccsm.exe           00000000004A2002  cam_restart_mp_ca         251  cam_restart.F90
ccsm.exe           000000000046D8DD  cam_comp_mp_cam_r         390  cam_comp.F90
ccsm.exe           000000000045F879  atm_comp_mct_mp_a         536  atm_comp_mct.F90
ccsm.exe           0000000000406E15  ccsm_comp_mod_mp_        2166  ccsm_comp_mod.F90
ccsm.exe           000000000041732D  MAIN__                     91  ccsm_driver.F90
ccsm.exe           000000000040557C  Unknown               Unknown  Unknown
libc.so.6          000000396C21D994  Unknown               Unknown  Unknown
ccsm.exe           0000000000405489  Unknown               Unknown  Unknown



This is being built on an x86_64 environment running RHEL5 with the Intel 11.1.038 compilers and running with mpich2 1.2 & netcdf 4.0.1. Also, my execution nodes are reporting that they have no limit on stack or memory for the jobs I am submitting.



Code:
cputime      unlimited
filesize     unlimited
datasize     unlimited
stacksize    unlimited
coredumpsize 0 kbytes
memoryuse    unlimited
vmemoryuse   unlimited
descriptors  20480
memorylocked unlimited
maxproc      unlimited



Has anyone else experienced something like this with the CAM model or had a similar issue when porting their cluster's CESM 1.0 to 1.0.3?
 
Top