Main menu


new machine build, run error

10 posts / 0 new
Last post
new machine build, run error

I am working to get a new build (cesm1_2_1) running on a linux cluster. The build appears to finish successfully as I get the message:

- Locking file env_build.xml

after building the test case. However, upon submitting the job to our que (we use the PBS queuing system) I get a fairly quick error and the expected run files are not appearing:

wythersk@node1084 [~/cases/testcase] $ ls env_derived user_nl_cam
Buildconf env_mach_pes.xml README.science_support user_nl_cice
CaseDocs env_mach_specific run user_nl_clm
CaseStatus #env_run.xml# SourceMods user_nl_cpl
cesm_setup env_run.xml user_nl_pop2
check_case exedir testcase.clean_build user_nl_rtm
check_input_data inputdata testcase.o1084694 xmlchange
create_production_test LockedFiles xmlquery logs testcase.submit
env_build.xml Macros Tools
env_case.xml preview_namelists

I think I’ve tracked down to a problem opening one of the file. Here is the section of the cesm.log file that I am referring to: Does this look familiar to anyone, or am I chasing the wrong thing here? Thank you in advance

8 pes participating in computation
0 node0316
1 node0316
2 node0316
3 node0316
4 node0316
5 node0316
6 node0316
7 node0316
Opened existing file
/home/reichpb/wythersk/cases/testcase/inputdata/atm/cam/inic/fv/cami_0001-01-01 65536
Opened existing file
/home/reichpb/wythersk/cases/testcase/inputdata/atm/cam/topo/USGS-gtopo30_4x5_r 131072
forrtl: severe (174): SIGSEGV, segmentation fault occurred



First please consider cesm1.2.2 instead of 1.2.1.    It's new and improved.  :-)    A lot of times when portring to a new machine the problem is your user environment settings, in particular the

user stack size limit and data limit.   We recommend setting them both to unlimited.   Use the limit command in csh or the ulimit command in bash to check the limit settings.   

CESM Software Engineer


Here is the results from ulimit. Disk was unlimited. I changed stack to unlimited 

wythersk@node1084 [~/cases/testcase/run] $ ulimit -a

core file size          (blocks, -c) 0

data seg size           (kbytes, -d) unlimited

scheduling priority             (-e) 0

file size               (blocks, -f) unlimited

pending signals                 (-i) 191956

max locked memory       (kbytes, -l) unlimited

max memory size         (kbytes, -m) unlimited

open files                      (-n) 10000

pipe size            (512 bytes, -p) 8

POSIX message queues     (bytes, -q) 819200

real-time priority              (-r) 0

stack size              (kbytes, -s) unlimited

cpu time               (seconds, -t) unlimited

max user processes              (-u) 1024

virtual memory          (kbytes, -v) unlimited

file locks                      (-x) unlimited


wythersk@node1084 [~/cases/testcase/run] $ 


However, same error on the USGS netCDF file:


 Opened existing file 

 /home/reichpb/wythersk/cases/testcase/inputdata/atm/cam/inic/fv/cami_0001-01-01       65536

 Opened existing file 

 /home/reichpb/wythersk/cases/testcase/inputdata/atm/cam/topo/USGS-gtopo30_4x5_r      131072


forrtl: severe (174): SIGSEGV, segmentation fault occurred


Other ideas?


Can you dump the file using ncdump?  

Check that the md5sum matches the expected value:

78bff47e307c5fb2395204c9f833a480  /glade/p/cesmdata/cseg/inputdata/atm/cam/topo/

You may also get more information by compiling with DEBUG=TRUE and setting core file size to a non-zero value.  


CESM Software Engineer


My md5sums look right:




Here is the dump file:


This all looks fine, you are going to need to dig deeper:

You may also get more information by compiling with DEBUG=TRUE and setting core file size to a non-zero value. 

CESM Software Engineer


Confirming that you mean line 133 in env_run.xml. Change value="0" to value="TRUE"? In additon, I'm not sure where the "core file size" option is changed to a "non-zero" value

wythersk@node1082 [~/cases/testcase] $ grep -n DEBUG env_run.xml 


133:<entry id="PIO_DEBUG_LEVEL"   value="0"  /> 


core file size is one of the limits in your environment, you printed it out a few posts ago.   DEBUG is set in env_build.xml and you should change the value using the

xmlchange utility ./xmlchange DEBUG=TRUE



CESM Software Engineer


Now (with DEBUG TRUE, and core file size set to 1024) I am having trouble with the build process. From:


more exedir/atm.bldlog.141007-095202


catastrophic error: **Internal compiler error: segmentation violation signal rai

sed** Please report this error along with the circumstances in which it occurred

 in a Software Problem Report.  Note: File and line given may not be explicit ca

use of this error.

compilation aborted for /home/reichpb/wythersk/cesm/dev/1.2.1/models/atm/cam/src

/dynamics/fv/sw_core.F90 (code 1)

gmake: *** [sw_core.o] Error 1

gmake: *** Waiting for unfinished jobs....


wythersk@node1084 [~/cases/f45g37_B1850CN] $ 


Any chance this is related to my original issue?


You failed to report what compiler you are using and you failed to update to cesm1.2.2 as requested.  

Please update to 1.2.2 and let us know what compiler you are using.

CESM Software Engineer

Log in or register to post comments

Who's new

  • 1658093099@...
  • mborreggine@...
  • kabirtam@...
  • suns@...
  • liangpeng0405@...