Main menu

Navigation

Failed CAM4 standalone on Yellowstone

11 posts / 0 new
Last post
paekh@...
Failed CAM4 standalone on Yellowstone

Hi,

 

I am trying to build CAM4 standalone (from CCSM4.0) on Yellowstone but it failed. It might be related to the missing header (sys/resource.h) but I could not find it in the subfolders of PGI 12.5. 

The error message is as :

pgcc -mp -lpsm_infinipath -Minfo=all -c -I. -I/glade/u/home/yzou/ccsm4_0/models/atm/cam/src/chemistry/bulk_aero -I/glade/u/home/yzou/ccsm4_0/models/atm/cam/src/chemistry/utils -I/glade/u/home/yzou/ccsm4_0/models/atm/cam/src/physics/cam -I/glade/u/home/yzou/ccsm4_0/models/atm/cam/src/dynamics/eul -I/glade/u/home/yzou/ccsm4_0/models/atm/cam/src/cpl_mct -I/glade/u/home/yzou/ccsm4_0/models/atm/cam/src/control -I/glade/u/home/yzou/ccsm4_0/models/csm_share/shr -I/glade/u/home/yzou/ccsm4_0/models/csm_share/dshr -I/glade/u/home/yzou/ccsm4_0/models/atm/cam/src/utils -I/glade/u/home/yzou/ccsm4_0/models/utils/timing -I/glade/u/home/yzou/ccsm4_0/models/utils/pio -I/glade/u/home/yzou/ccsm4_0/models/utils/mct/mpeu -I/glade/u/home/yzou/ccsm4_0/models/utils/mct/mct -I/glade/u/home/yzou/ccsm4_0/models/utils/esmf_wrf_timemgr -I/glade/u/home/yzou/ccsm4_0/models/atm/cam/src/advection/slt -I/glade/u/home/yzou/ccsm4_0/models/drv/driver -I/glade/u/home/yzou/ccsm4_0/models/drv/shr -I/glade/u/home/yzou/ccsm4_0/models/atm/cam/src/utils/cam_dom -I/glade/u/home/yzou/ccsm4_0/models/atm/cam/src/utils/cam_dom/cpl_mct -I/glade/u/home/yzou/ccsm4_0/models/lnd/clm/src/main/cpl_mct -I/glade/u/home/yzou/ccsm4_0/models/lnd/clm/src/main -I/glade/u/home/yzou/ccsm4_0/models/lnd/clm/src/biogeophys -I/glade/u/home/yzou/ccsm4_0/models/lnd/clm/src/biogeochem -I/glade/u/home/yzou/ccsm4_0/models/lnd/clm/src/riverroute -I/glade/u/home/yzou/ccsm4_0/models/ice/cice/src/drivers/cpl_mct -I/glade/u/home/yzou/ccsm4_0/models/ice/cice/src/drivers/cpl_share -I/glade/u/home/yzou/ccsm4_0/models/ice/cice/src/mpi -I/glade/u/home/yzou/ccsm4_0/models/ice/cice/src/source -I/glade/u/home/yzou/ccsm4_0/models/glc/sglc/cpl_mct -I/glade/apps/opt/netcdf/4.2/pgi/12.5/include -I/ncar/opt/pgi/12.5.0/linux86-64/2012/mpi/mpich/include -DNO_SHR_VMATH -DSEQ_MCT -DFORTRANUNDERSCORE -DCO2A -DMAXPATCH_PFT=numpft+1 -DLSMLAT=1 -DLSMLON=1 -DCOUP_DOM -DPLON=128 -DPLAT=64 -DPLEV=26 -DPCNST=3 -DPCOLS=16 -DPTRM=42 -DPTRN=42 -DPTRK=42 -DCCSMCOUPLED -Dcoupled -Dncdf -DNCAT=1 -DNXGLOB=128 -DNYGLOB=64 -DNTR_AERO=0 -DBLCKX=2 -DBLCKY=64 -DMXBLCKS=1 -D_USEBOX -D_NETCDF -DNO_MPI2 -DSPMD -DLINUX -DNO_R16 -fast -L/ncar/opt/pgi/12.5.0/linux86-64/2012/mpi/mpich/lib -I/ncar/opt/pgi/12.5.0/linux86-64/2012/mpi/mpich/include /glade/u/home/yzou/ccsm4_0/models/utils/timing/GPTLget_memusage.c

PGC-F-0206-Can't find include file sys/resource.h (/glade/u/home/yzou/ccsm4_0/models/utils/timing/GPTLget_memusage.c: 13)
PGC/x86-64 Linux 12.5-0: compilation aborted
gmake: *** [GPTLget_memusage.o] Error 2

 

I will highly appreciate any suggestion.

Best regards,

Danny

Danny

mai

Are you submitting a batch job to build the executable? If so, try running the build script interactively from the command of line one of the yellowstone front-end nodes (ysloginN, N=1,2,3,4,5,6).

paekh@...

Hi Mai,

 

Thanks so much for your comments.

I submitted a batch jobs before, tried interactively as you suggested and the problem was solved.

However, I encountered another issue on netcdf as :

........................

CICE_InitMod.o: In function `.C2_3416':

CICE_InitMod.F90:(.data+0x134): undefined reference to `typesizes_'

CICE_InitMod.F90:(.data+0x13c): undefined reference to `netcdf_'

CICE_RunMod.o: In function `.C4_292':

CICE_RunMod.F90:(.data+0x240): undefined reference to `typesizes_'

CICE_RunMod.F90:(.data+0x248): undefined reference to `netcdf_'

SNICARMod.o: In function `snicarmod_snowoptics_init_':

/glade/u/home/yzou/ccsm4_0/models/lnd/clm/src/biogeophys/SNICARMod.F90:1377: undefined reference to `nf_open_'

/glade/u/home/yzou/ccsm4_0/models/lnd/clm/src/biogeophys/SNICARMod.F90:1377: undefined reference to `nf_inq_varid_'

/glade/u/home/yzou/ccsm4_0/models/lnd/clm/src/biogeophys/SNICARMod.F90:1377: undefined reference to `nf_get_var_double_'

/glade/u/home/yzou/ccsm4_0/models/lnd/clm/src/biogeophys/SNICARMod.F90:1377: undefined reference to `nf_inq_varid_'

...........................

 

Attached is make file.

 

 

 

 

Attachment: 

Danny

erik

It looks like to me that you need to change your build so that the -DFORTRANUNDERSCORE is removed from USER_CPPDEFS in the Makefile. This is using the CAM standalone makefile, so I'll ask some CAM folks to look at this.

Erik Kluzek ...............

CESM Land Model (CLM) Software Liason

CESM Software Engineering Group, NCAR

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

paekh@...

Hi erik,

 

Thanks so much for your suggestion.

The problem was solved with the addition of -lnetcdff as suggested by other posts. I also tried to remove -DFORTRANUNDERSCORE but it led to another error as :

PGC-F-0249-#error --  "Unrecognized Fortran-mangle type" (/glade/u/home/yzou/ccsm4_0/models/csm_share/shr/shr_isnan.c: 21)

PGC/x86-64 Linux 12.5-0: compilation aborted

gmake: *** [shr_isnan.o] Error 2

 

 

 

 

Danny

paekh@...

Now I have another issue which might be related to mpi as :

 

CalcWorkPerBlock: Total blocks:    64 Ice blocks:    64 IceFree blocks:     0 Land blocks:     0

  Processors (X x Y) =    1 x    1

 Active processors:             1

(shr_sys_abort) ERROR:  ice: no. blocks exceed max: increase max to           64

(shr_sys_abort) WARNING: calling shr_mpi_abort() and stopping

p0_14382:  p4_error: interrupt SIGSEGV: 11

/ncar/opt/pgi/12.5.0/linux86-64/2012/mpi/mpich/bin/mpirun.ch_p4: line 243: 14382 Segmentation fault      (core dumped) /glade/p/work/yzou/cam4/output/amo_x2_tot/AMOx2_TOT1/bld/cam -p4pg /glade/p/work/yzou/cam4/output/amo_x2_tot/AMOx2_TOT1/PI14325 -p4wd /glade/p/work/yzou/cam4/output/amo_x2_tot/AMOx2_TOT1

 Thanks in advance for your support.

 

Danny

eaton

My guess is that the -ntasks argument to CAM's configure and the number of tasks used in the run are inconsistent.  These numbers should be the same.

 

paekh@...

Hi eaton,

 

Thanks for your reply.

ntasks=64 in configure as : 

setenv INC_NETCDF /glade/apps/opt/netcdf/4.2/pgi/12.5/include

setenv LIB_NETCDF /glade/apps/opt/netcdf/4.2/pgi/12.5/lib

setenv PGI /ncar/opt/pgi/12.5.0

setenv INC_MPI ${PGI}/linux86-64/2012/mpi/mpich/include

setenv LIB_MPI ${PGI}/linux86-64/2012/mpi/mpich/lib

$cfgdir/configure -v -dyn eul -res 64x128 -spmd -nosmp -ntasks 64 -nc_inc $INC_NETCDF  -nc_lib $LIB_NETCDF -mpi_inc $INC_MPI -mpi_lib $LIB_MPI  -fc "pgf90 -mp -Minfo=all -lnetcdff" -cflags "-L$LIB_MPI -I$INC_MPI" -cc "pgcc  -mp -Minfo=all"  -fflags "-L$LIB_MPI -I$INC_MPI"   || exit 1

 

In our local machine, we can set tasks such as "mpirun -np 64 .....",

however, I have no idea how to set tasks on yellowstone. my script just includes :

#BSUB -n 64                  # number of tasks in job         

#BSUB -R "span[ptile=16]"    # run 16 MPI tasks per node

...

mpirun.lsf $blddir/cam < namelist >&! $logfile

 

Attached is log file. 

Attachment: 

Danny

paekh@...

I added -nthreads 1 in config as :

$cfgdir/configure -v -dyn eul -res 64x128 -spmd -nosmp -ntasks 64 -nthreads 1 -nc_inc $INC_NETCDF  -nc_lib $LIB_NETCDF -mpi_inc $INC_MPI -mpi_lib $LIB_MPI  -fc "pgf90 -mp -Minfo=all -lnetcdff" -cflags "-L$LIB_MPI -I$INC_MPI" -cc "pgcc  -mp -Minfo=all"  -fflags "-L$LIB_MPI -I$INC_MPI" -test

 

The above problem is solved but I have another error as :

  16:PGFIO-F-209/OPEN/unit=10/'OLD' specified for file which does not exist.

  16: File name = drv_in

  16: In source file /glade/p/work/yzou/ccsm4_0/models/drv/driver/seq_io_mod.F90, at line number 164

 

drv_in is in run directory.

 

Attached is a log file. 

Danny

eaton

You can see from the cam_log.txt file that multiple copies of the job are running using 1 task each.  Don't know how that would happen.  The BSUB commands and using mpirun.lsf for the job launcher look correct.

The main thing I see which is a problem is that you are trying to use the pgcc and pgf90 compilers directly, and supplying mpi and netcdf paths manually.  This procedure is not robust.  On yellowstone you should be using the mpif90 and mpicc compiler wrappers, and not specify anything about mpi or netcdf yourself.  The compiler wrappers on yellowstone know how to link with mpi and netcdf and that's the only robust way to do it.  In order to use pgi compilers you need to use the module mechanism on yellowstone.  Using such an old CAM version complicates the build because that old build didn't have the options to make use of the compiler wrappers.  So my first recommendation is to use the latest CESM release which supports the cam4 physics package as an option.  If there are other reasons you need to stick with CCSM4 then you'll need to modify the Makefile produced by CAM's configure to get it to use the compiler wrappers.

paekh@...

Hi eaton,

 

Thanks for your suggestion. I will try to use CESM with cam4 physics.

 

Danny

Log in or register to post comments

Who's new

  • federico
  • shreya.dhame@...
  • nooned@...
  • rjallen@...
  • sunjzh13@...