Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

running CAM at Jaguar, ORNL

sfeng2@unl_edu

New Member
I usually run the CAM model at NCAR computers. Recently, I am trying to run the CAM at Jaguar, ORNL. The procedures to set up the CAM runs at the Jaguar are much different from the NCAR computers. Currently, I have trouble to compile the model and build the executable file.

I first load the netcdf library using 'module load netcdf'.
Then I used the following short script to build and compile the code:

#! /usr/bin/csh -f

set camroot = /ccs/home/sfeng/CAM/cam1

setenv CSMDATA /ccs/home/sfeng/CAM

## LOGNAME - used in default settings, must be set if not available
## setenv LOGNAME
if !($?LOGNAME) then
echo "LOGNAME not available for setting of defaults - setting must be added to this script"
exit 1
endif

## Default namelist settings:
## $case is the case identifier for this run. It will be placed in the namelist.
## $runtype is the run type: startup, continue, or branch.
## $stop_n is the number of days to integrate (units depend on stop_option)
set case = camrun
set runtype = startup
set stop_n = 1

## $wrkdir is a working directory where the model will be built and run.
## $blddir is the directory where model will be compiled.
## $rundir is the directory where the model will be run.
## $cfgdir is the directory containing the CAM configuration scripts.
set wrkdir = /tmp/work/$LOGNAME
set blddir = $wrkdir/$case/bld
set rundir = $wrkdir/$case
set cfgdir = $camroot/models/atm/cam/bld

## Ensure that run and build directories exist
mkdir -p $rundir || echo "cannot create $rundir" && exit 1
mkdir -p $blddir || echo "cannot create $blddir" && exit 1
####echo " OK so far !!!!"

## If an executable doesn't exist, build one.
if ( ! -x $blddir/cam ) then
cd $blddir || echo "cd $blddir failed" && exit 1
$cfgdir/configure || echo "configure failed" && exit 1
echo "building CAM in $blddir ..."
rm -f Depends
gmake -j8 >&! MAKE.out || echo "CAM build failed: see $blddir/MAKE.out" && exit 1
endif

## Create the namelist
cd $blddir || echo "cd $blddir failed" && exit 1
$cfgdir/build-namelist -s -case $case -runtype $runtype
-namelist "&camexp stop_option='ndays', stop_n=$stop_n /" || echo "build-namelist failed" && exit 1

exit 0



The following are the error I got:

** Cannot find netcdf.inc in specified directory: /usr/local/include
**
** The NetCDF include directory is determined from the following set of options listed
** from highest to lowest precedence:
** * interactively, enabled by command-line option -i
** * by the command-line option -nc_inc
** * by a default configuration file, specified by -defaults
** * by the environment variable INC_NETCDF
** * by the default value /usr/local/include
configure failed


There is not NetCDF library at /usr/local/include, but I loaded it using 'module load netcdf'. How does the 'netcdf' is used after it is loaded in Jaguar?


It seems that there are a lot groups are running CAM or CCSM at Jaguar, can anybody provide a sample 'build' and 'run' script you used to complier and run the model?

Best

Song Feng
email: sfeng2@unl.edu
 

sfeng2@unl_edu

New Member
I added the following two lines at the build script:

setenv INC_NETCDF /opt/cray/netcdf/4.0.0.3/netcdf-pgi/include
setenv LIB_NETCDF /opt/cray/netcdf/4.0.0.3/netcdf-pgi/lib


And then I added the option '-fc ftn' on confiture

$cfgdir/configure -fc ftn

The problem related to 'NetCDF' was solved, but I got other errors:

creating /lustre/scr72a/sfeng/camrun/bld/Filepath
creating /lustre/scr72a/sfeng/camrun/bld/params.h
creating /lustre/scr72a/sfeng/camrun/bld/misc.h
creating /lustre/scr72a/sfeng/camrun/bld/preproc.h
creating /lustre/scr72a/sfeng/camrun/bld/Makefile
creating /lustre/scr72a/sfeng/camrun/bld/config_cache.xml
configure done.
building CAM in /tmp/work/sfeng/camrun/bld ...
CAM build failed: see /tmp/work/sfeng/camrun/bld/MAKE.out

The error messages I got are as the following:

PGF90-F-0226-Can't find include file misc.h (/ccs/home/sfeng/CAM/cam1/models/atm/cam/src/dynamics/eul/pmgrid.F90: 1)
PGF90/x86-64 Linux 8.0-3: compilation aborted
gmake: *** [pmgrid.o] Error 2
gmake: *** [mpishorthand.o] Error 2
gmake: *** [clm_varpar.o] Error 2
gmake: *** [QSatMod.o] Error 2
nux 8.0-3: compilation aborted
PGF90/x86-64 Linux 8.0-3: compilation aborted
PGF90-F-0226-Can't find include file misc.h (/ccs/home/sfeng/CAM/cam1/models/lnd/clm2/src/biogeophys/QSatMod.F90: 1)


I believe I have the 'misc.h' at the bld directory. Can anybody help me?
 
Hello

Everything you're using in the script looks OK except the "rm -f Depends" line.
Perhaps that is the problem.

One additional thing: Your CSMDATA location needs to be on Lustre. You're currently pointing to /ccs/home/sfeng/CAM. If I'm not mistaken, neither of the XT systems can see the filesystem where /home is mounted. Copy (Don't mv!... the data will be erased after 14 days if it's not accessed) the dir to your work directory and point the build-script there.

Hope that help.
ML
 

sfeng2@unl_edu

New Member
I can now complie the code,but I can not run the model. The model stopped when try to read the namelist. All the input files listed in the 'namelist' are default files and saved at the specified directories. I also tried to use the namelist used by others, but get the same error message.

The error message I got is listed below:

DYCORE is EUL
READ_NAMELIST: Namelist read returns -1
ENDRUN: called without a message string
READ_NAMELIST: Namelist read returns -1
READ_NAMELIST: Namelist read returns -1
ENDRUN: called without a message string
ENDRUN: called without a message string
READ_NAMELIST: Namelist read returns -1
ENDRUN: called without a message string
[NID 13296]Apid 2905055: initiated application termination
Application 2905055 exit signals: Aborted
Application 2905055 resources: utime 0, stime 2
CAM run failed


Because other scientists run the CAM model well at XT4, my problems may be caused by incorrect 'modules'. The following modules were loaded before compile and run the model:

module load ncl
module load nco
module load subversion
module load PrgEnv-pgi Base-opts
module load torque moab
module load netcdf/3.6.2


Any suggestions?
 

hannay

Cecile Hannay
AMWG Liaison
Staff member
Please could you give read permissions to your run directory, co I can have a look whether there is a problem with namelist.
 
Top