Main menu

Navigation

Segmentation fault error running CESM 1.2.2 on a new Mac port

5 posts / 0 new
Last post
dossa013@...
Segmentation fault error running CESM 1.2.2 on a new Mac port

Hi everyone,

 

I am trying to run a simple CESM case on a Mac Pro with Yosemite. This desktop has a HT quadcore processor ("eight" cores) and 12GB RAM.

 

So far I managed to build the case, but I am receiving an error while running it. The compilers I am working with are gcc and gfortran, which I used to compile mpich, HDF5 with parallel IO and NetCDF C and Fortran also with parallel IO.

 

I am trying to run a simple CLM case found on CLM's user guide:

 

./create_newcase -case /Users/dossa013/data/cesm-cases/testSPDATASET -res 1x1_brazil -compset I -mach userdefined

 

This is my Macros file:

---------------------------------------------------------------------------------

#

# Makefile Macros generated from /Users/dossa013/models/cesm1_2_2/scripts/ccsm_utils/Machines/config_compilers.xml using

# COMPILER=gnu

# OS=generic_darwin_gnu

# MACH=userdefined

#

CPPDEFS+= -DFORTRANUNDERSCORE -DNO_R16 -DDarwin -DCPRGNU

SLIBS+= -L/Users/dossa013/software/cesm-software/lib -lnetcdff -lnetcdf

CONFIG_ARGS:=

CXX_LINKER:=FORTRAN

ESMF_LIBDIR:=

FC_AUTO_R8:= -fdefault-real-8

FFLAGS:= -O -ffree-line-length-none -ffixed-line-length-none -fno-range-check

FFLAGS_NOOPT:= -O0

FIXEDFLAGS:=  -ffixed-form

FREEFLAGS:= -ffree-form

MPICC:= mpicc

MPICXX:= mpicxx

MPIFC:= mpif90

MPI_LIB_NAME:= mpich

MPI_PATH:= /Users/dossa013/software/cesm-software

NETCDF_PATH:= /Users/dossa013/software/cesm-software

PNETCDF_PATH:= /Users/dossa013/software/cesm-software

SCC:= gcc

SCXX:= g++

SFC:= gfortran

SUPPORTS_CXX:=TRUE

ifeq ($(DEBUG), TRUE)

   FFLAGS += -g -Wall

endif

ifeq ($(compile_threaded), true)

   LDFLAGS += -fopenmp

   CFLAGS += -fopenmp

   FFLAGS += -fopenmp

endif

ifeq ($(MODEL), cism)

   CMAKE_OPTS += -D CISM_GNU=ON

endif

ifeq ($(MODEL), pop2)

   CPPDEFS += -D_USE_FLOW_CONTROL

endif

---------------------------------------------------------------------------------

 

This is my run script:

 

---------------------------------------------------------------------------------

#!/bin/csh -f

#===============================================================================

# USERDEFINED

# This is where the batch submission is set.  The above code computes

# the total number of tasks, nodes, and other things that can be useful

# here.  Use PBS, BSUB, or whatever the local environment supports.

#===============================================================================

 

##PBS -N testSPDATASET

##PBS -q batch

##PBS -l nodes=1:ppn=8

##PBS -l walltime=00:59:00

##PBS -r n

##PBS -j oe

##PBS -S /bin/csh -V

 

##BSUB -l nodes=1:ppn=8:walltime=00:59:00

##BSUB -q batch

###BSUB -k eo

###BSUB -J testSPDATASET

###BSUB -W 00:59:00

 

#limit coredumpsize 1000000

#limit stacksize unlimited

 

 

# ----------------------------------------

# PE LAYOUT:

#   total number of tasks  = 1

#   maximum threads per task = 1

#   cpl ntasks=1  nthreads=1 rootpe=0 ninst=1

#   datm ntasks=1  nthreads=1 rootpe=0 ninst=1

#   clm ntasks=1  nthreads=1 rootpe=0 ninst=1

#   sice ntasks=1  nthreads=1 rootpe=0 ninst=1

#   socn ntasks=1  nthreads=1 rootpe=0 ninst=1

#   sglc ntasks=1  nthreads=1 rootpe=0 ninst=1

#   swav ntasks=1  nthreads=1 rootpe=0 ninst=1

#   rtm ntasks=1  nthreads=1 rootpe=0 ninst=1

#

#   total number of hw pes = 1

#     cpl hw pe range ~ from 0 to 0

#     datm hw pe range ~ from 0 to 0

#     clm hw pe range ~ from 0 to 0

#     sice hw pe range ~ from 0 to 0

#     socn hw pe range ~ from 0 to 0

#     sglc hw pe range ~ from 0 to 0

#     swav hw pe range ~ from 0 to 0

#     rtm hw pe range ~ from 0 to 0

# ----------------------------------------

cd /Users/dossa013/data/cesm-cases/testSPDATASET

 

./Tools/ccsm_check_lockedfiles || exit -1

source ./Tools/ccsm_getenv     || exit -2

 

if ($BUILD_COMPLETE != "TRUE") then

  echo "BUILD_COMPLETE is not TRUE"

  echo "Please rebuild the model interactively"

  exit -2

endif

 

# BATCHQUERY is in env_run.xml

setenv LBQUERY "TRUE"

if !($?BATCHQUERY) then

  setenv LBQUERY "FALSE"

  setenv BATCHQUERY "undefined"

else if ( "$BATCHQUERY" == 'UNSET' ) then

  setenv LBQUERY "FALSE"

  setenv BATCHQUERY "undefined"

endif

 

# BATCHSUBMIT is in env_run.xml

setenv LBSUBMIT "TRUE"

if !($?BATCHSUBMIT) then

  setenv LBSUBMIT "FALSE"

  setenv BATCHSUBMIT "undefined"

else if ( "$BATCHSUBMIT" == 'UNSET' ) then

  setenv LBSUBMIT "FALSE"

  setenv BATCHSUBMIT "undefined"

endif

 

# --- Create and cleanup the timing directories---

 

if !(-d $RUNDIR) mkdir -p $RUNDIR || "cannot make $RUNDIR" && exit -1

if (-d $RUNDIR/timing) rm -r -f $RUNDIR/timing

mkdir $RUNDIR/timing

mkdir $RUNDIR/timing/checkpoints

 

# --- Determine time-stamp/file-ID string ---

setenv LID "`date +%y%m%d-%H%M%S`"

 

set sdate = `date +"%Y-%m-%d %H:%M:%S"`

echo "run started $sdate" >>& $CASEROOT/CaseStatus

 

echo "-------------------------------------------------------------------------"

echo " CESM BUILDNML SCRIPT STARTING"

echo " - To prestage restarts, untar a restart.tar file into $RUNDIR"

 

./preview_namelists

if ($status != 0) then

   echo "ERROR from preview namelist - EXITING"

   exit -1

endif

 

echo " CESM BUILDNML SCRIPT HAS FINISHED SUCCESSFULLY"

echo "-------------------------------------------------------------------------"

 

echo "-------------------------------------------------------------------------"

echo " CESM PRESTAGE SCRIPT STARTING"

echo " - Case input data directory, DIN_LOC_ROOT, is $DIN_LOC_ROOT"

echo " - Checking the existence of input datasets in DIN_LOC_ROOT"

 

# This script prestages as follows

# - DIN_LOC_ROOT is the local inputdata area, check it exists

# - check whether all the data is in DIN_LOC_ROOT

# - prestage the REFCASE data if needed

 

cd $CASEROOT

 

if !(-d $DIN_LOC_ROOT) then

  echo " "

  echo "  ERROR DIN_LOC_ROOT $DIN_LOC_ROOT does not exist"

  echo " "

  exit -20

endif

 

if (`./check_input_data -inputdata $DIN_LOC_ROOT -check | grep "unknown" | wc -l` > 0) then

   echo " "

   echo 'Any files with "status unknown" below were not found in the'

   echo 'expected location, and are not from the input data repository.'

   echo 'This is informational only; this script will not attempt to'

   echo 'find these files. If CESM can find (or does not need) these files'

   echo 'at run time, no error will result.'

   ./check_input_data -inputdata $DIN_LOC_ROOT -check

   echo " "

endif

 

if (`./check_input_data -inputdata $DIN_LOC_ROOT -check | grep "missing" | wc -l` > 0) then

   echo "Attempting to download missing data:"

   ./check_input_data -inputdata $DIN_LOC_ROOT -export

endif

 

if (`./check_input_data -inputdata $DIN_LOC_ROOT -check | grep "missing" | wc -l` > 0) then

   echo " "

   echo "The following files were not found, they are required"

   ./check_input_data -inputdata $DIN_LOC_ROOT -check

   echo "Invoke the following command to obtain them"

   echo "   ./check_input_data -inputdata $DIN_LOC_ROOT -export"

   echo " "

   exit -30

endif

 

if (($GET_REFCASE == 'TRUE') && ($RUN_TYPE != 'startup') && ($CONTINUE_RUN == 'FALSE')) then

  set refdir = "ccsm4_init/$RUN_REFCASE/$RUN_REFDATE"

 

  if !(-d $DIN_LOC_ROOT/$refdir) then

    echo "*****************************************************************"

    echo "ccsm_prestage ERROR: $DIN_LOC_ROOT/$refdir is not on local disk"

    echo "obtain this data from the svn input data repository:"

    echo "  > mkdir -p $DIN_LOC_ROOT/$refdir"

    echo "  > cd $DIN_LOC_ROOT/$refdir"

    echo "  > cd .."

    echo "  > svn export --force https://svn-ccsm-inputdata.cgd.ucar.edu/trunk/inputdata/$refdir"

    echo "or set GET_REFCASE to FALSE in env_run.xml, "

    echo "   and prestage the restart data to $RUNDIR manually"

    echo "*****************************************************************"

    exit -1

  endif

 

  echo " - Prestaging REFCASE ($refdir) to $RUNDIR"

  if !(-d $RUNDIR) mkdir -p $RUNDIR || "cannot make $RUNDIR" && exit -1

  foreach file ($DIN_LOC_ROOT/$refdir/*${RUN_REFCASE}*)

     if !(-f $RUNDIR/$file:t) then

        ln -s $file $RUNDIR || "cannot prestage $DIN_LOC_ROOT/$refdir data to $RUNDIR" && exit -1

     endif

  end

  cp $DIN_LOC_ROOT/$refdir/*rpointer* $RUNDIR || "cannot prestage $DIN_LOC_ROOT/$refdir rpointers to $RUNDIR" && exit -1

 

  cd $RUNDIR

  set cam2_list = `sh -c 'ls *.cam2.* 2>/dev/null'`

  foreach cam2_file ($cam2_list)

    set cam_file = `echo $cam2_file | sed -e 's/cam2/cam/'`

    ln -fs $cam2_file $cam_file

  end

 

  chmod u+w $RUNDIR/* >& /dev/null

endif

 

echo " CESM PRESTAGE SCRIPT HAS FINISHED SUCCESSFULLY"

echo "-------------------------------------------------------------------------"

 

sleep 25

cd $RUNDIR

echo "`date` -- CSM EXECUTION BEGINS HERE"

 

setenv OMP_NUM_THREADS 1

 

#===============================================================================

# USERDEFINED

# edit job launching

#===============================================================================

 

mpiexec -n 1 $EXEROOT/cesm.exe >&! cesm.log.$LID

#mpirun -np 1 $EXEROOT/cesm.exe >&! cesm.log.$LID

 

wait

echo "`date` -- CSM EXECUTION HAS FINISHED"

 

# -------------------------------------------------------------------------

# Update env variables in case user changed them during run

# -------------------------------------------------------------------------

 

cd $CASEROOT

source ./Tools/ccsm_getenv

 

# -------------------------------------------------------------------------

# Check for successful run

# -------------------------------------------------------------------------

 

set sdate = `date +"%Y-%m-%d %H:%M:%S"`

cd $RUNDIR

set CESMLogFile = `ls -1t cesm.log* | head -1`

if ($CESMLogFile == "") then

  echo "Model did not complete - no cesm.log file present - exiting"

  exit -1

endif

set CPLLogFile = `echo $CESMLogFile | sed -e 's/cesm/cpl/'`

if ($CPLLogFile == "") then

  echo "Model did not complete - no cpl.log file corresponding to most recent CESM log ($RUNDIR/$CESMLogFile)"

  exit -1

endif

grep 'SUCCESSFUL TERMINATION' $CPLLogFile  || echo "Model did not complete - see $RUNDIR/$CESMLogFile" && echo "run FAILED $sdate" >>& $CASEROOT/CaseStatus && exit -1

 

echo "run SUCCESSFUL $sdate" >>& $CASEROOT/CaseStatus

 

 

# -------------------------------------------------------------------------

# Save model output logs

# -------------------------------------------------------------------------

 

gzip *.$LID

if ($LOGDIR != "") then

  if (! -d $LOGDIR/bld) mkdir -p $LOGDIR/bld || echo " problem in creating $LOGDIR/bld"

  cp -p $RUNDIR/*log.$LID.*   $LOGDIR/

endif

 

# -------------------------------------------------------------------------

# Perform short term archiving of output

# -------------------------------------------------------------------------

cd $CASEROOT

 

if ($DOUT_S == 'TRUE') then

  echo "Archiving cesm output to $DOUT_S_ROOT"

  echo "Calling the short-term archiving script st_archive.sh"

  cd $RUNDIR; $CASETOOLS/st_archive.sh

endif

 

# -------------------------------------------------------------------------

# Submit longer term archiver if appropriate

# -------------------------------------------------------------------------

 

 

if ($DOUT_L_MS == 'TRUE' && $DOUT_S == 'TRUE') then

  echo "Long term archiving ccsm output using the script $CASE.l_archive"

  set num = 0

  if ($LBQUERY == "TRUE") then

     set num = `$BATCHQUERY | grep $CASE.l_archive | wc -l`

  endif

  if ($LBSUBMIT == "TRUE" && $num < 1) then

cat > templar <<EOF

    $BATCHSUBMIT ./$CASE.l_archive

EOF

    source templar

    if ($status != 0) then

      echo "ccsm_postrun error: problem sourcing templar "

    endif

    rm templar

  endif

endif

 

# -------------------------------------------------------------------------

# Resubmit another run script

# -------------------------------------------------------------------------

 

if ($RESUBMIT > 0) then

    @ RESUBMIT = $RESUBMIT - 1

    echo RESUBMIT is now $RESUBMIT

 

    #tcraig: reset CONTINUE_RUN on RESUBMIT if NOT doing timing runs

    #use COMP_RUN_BARRIERS as surrogate for timing run logical

    if ($?COMP_RUN_BARRIERS) then

      if (${COMP_RUN_BARRIERS} == "FALSE") then

         ./xmlchange -file env_run.xml -id CONTINUE_RUN -val TRUE

      endif

    else

      ./xmlchange -file env_run.xml -id CONTINUE_RUN -val TRUE

    endif

    ./xmlchange -file env_run.xml -id RESUBMIT     -val $RESUBMIT

 

    if ($LBSUBMIT == "TRUE") then

cat > tempres <<EOF

   $BATCHSUBMIT ./$CASE.run

EOF

     source tempres

     if ($status != 0) then

       echo "ccsm_postrun error: problem sourcing tempres "

     endif

     rm tempres

   endif

endif

 

if ($CHECK_TIMING == 'TRUE') then

  if !(-d timing) mkdir timing

  $CASETOOLS/getTiming.csh -lid $LID

  gzip timing/ccsm_timing_stats.$LID

endif

 

if ($SAVE_TIMING == 'TRUE') then

  mv $RUNDIR/timing $RUNDIR/timing.$LID

endif

---------------------------------------------------------------------------------

 

This is the output of my ulimit -a:

 

$ ulimit -a

core file size          (blocks, -c) unlimited

data seg size           (kbytes, -d) unlimited

file size               (blocks, -f) unlimited

max locked memory       (kbytes, -l) unlimited

max memory size         (kbytes, -m) unlimited

open files                      (-n) 2560

pipe size            (512 bytes, -p) 1

stack size              (kbytes, -s) 65532

cpu time               (seconds, -t) unlimited

max user processes              (-u) 709

virtual memory          (kbytes, -v) unlimited

---------------------------------------------------------------------------------

And the error log is attached to this message.  Anyone have any clue how I can fix this segmentation fault error? 
jedwards

First try setting the stack size to unlimited.   If it still fails in that case try rebuilding and running in DEBUG mode.  

CESM Software Engineer

dossa013@...

Thanks for the fast input Jim.

 

It turns out that the stack size in OS X systems is limited by the kernel to 64MB. This command doesn't work:

 

sudo ulimit -s unlimited

/usr/bin/ulimit: line 4: ulimit: stack size: cannot modify limit: Invalid argument

 

Apparently, the only way to increase it seems to be during compilation, by adding the argument -Wl,-stack_size,0x10000000,-stack_addr,0xc0000000.

 

Please take a look at this link: http://linuxtoosx.blogspot.com/2010/10/stack-overflow-increasing-stack-l...

 

I guess the question now is: is there any way to pass this argument to both gcc and gfortran when compiling CESM?

jedwards

You can add it to the FFLAGS and CFLAGS in the Macros file. 

CESM Software Engineer

dossa013@...

Jim, 

Thanks a lot for your responsiveness. My CESM port to a Mac with Yosemite and GNU compilers is building and running now, or at least these three cases:


./create_newcase -case ~/data/test1 -res f45_g37 -compset X -mach userdefined

./create_newcase -case ~/data/testBRAMAZON -res 5x5_amazon -compset I -mach userdefined

./create_newcase -case ~/data/testSPDATASET -res 1x1_brazil -compset I -mach userdefined

  

I managed to solve the segmentation fault by increasing stack size via compiler, and also by increasing stack size and number of open files in the system. Because I combined those three solutions at once, I'm not sure whether one of them or all of them fixed the error. The commands I used are:

 
FFLAGS:= -O -ffree-line-length-none -ffixed-line-length-none -fno-range-check -fmax-stack-var-size=1048576


ulimit -s hard ; ulimit -n 4096


Regards,
Thiago.

 

Log in or register to post comments

Who's new

  • praveenmaniyatt@...
  • arjunbabun11@...
  • lama@...
  • sisi393@...
  • 1658093099@...