Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Failed CAM4 standalone on Yellowstone

paekh@uci_edu

New Member
Hi, I am trying to build CAM4 standalone (from CCSM4.0) on Yellowstone but it failed. It might be related to the missing header (sys/resource.h) but I could not find it in the subfolders of PGI 12.5. The error message is as :pgcc -mp -lpsm_infinipath -Minfo=all -c -I. -I/glade/u/home/yzou/ccsm4_0/models/atm/cam/src/chemistry/bulk_aero -I/glade/u/home/yzou/ccsm4_0/models/atm/cam/src/chemistry/utils -I/glade/u/home/yzou/ccsm4_0/models/atm/cam/src/physics/cam -I/glade/u/home/yzou/ccsm4_0/models/atm/cam/src/dynamics/eul -I/glade/u/home/yzou/ccsm4_0/models/atm/cam/src/cpl_mct -I/glade/u/home/yzou/ccsm4_0/models/atm/cam/src/control -I/glade/u/home/yzou/ccsm4_0/models/csm_share/shr -I/glade/u/home/yzou/ccsm4_0/models/csm_share/dshr -I/glade/u/home/yzou/ccsm4_0/models/atm/cam/src/utils -I/glade/u/home/yzou/ccsm4_0/models/utils/timing -I/glade/u/home/yzou/ccsm4_0/models/utils/pio -I/glade/u/home/yzou/ccsm4_0/models/utils/mct/mpeu -I/glade/u/home/yzou/ccsm4_0/models/utils/mct/mct -I/glade/u/home/yzou/ccsm4_0/models/utils/esmf_wrf_timemgr -I/glade/u/home/yzou/ccsm4_0/models/atm/cam/src/advection/slt -I/glade/u/home/yzou/ccsm4_0/models/drv/driver -I/glade/u/home/yzou/ccsm4_0/models/drv/shr -I/glade/u/home/yzou/ccsm4_0/models/atm/cam/src/utils/cam_dom -I/glade/u/home/yzou/ccsm4_0/models/atm/cam/src/utils/cam_dom/cpl_mct -I/glade/u/home/yzou/ccsm4_0/models/lnd/clm/src/main/cpl_mct -I/glade/u/home/yzou/ccsm4_0/models/lnd/clm/src/main -I/glade/u/home/yzou/ccsm4_0/models/lnd/clm/src/biogeophys -I/glade/u/home/yzou/ccsm4_0/models/lnd/clm/src/biogeochem -I/glade/u/home/yzou/ccsm4_0/models/lnd/clm/src/riverroute -I/glade/u/home/yzou/ccsm4_0/models/ice/cice/src/drivers/cpl_mct -I/glade/u/home/yzou/ccsm4_0/models/ice/cice/src/drivers/cpl_share -I/glade/u/home/yzou/ccsm4_0/models/ice/cice/src/mpi -I/glade/u/home/yzou/ccsm4_0/models/ice/cice/src/source -I/glade/u/home/yzou/ccsm4_0/models/glc/sglc/cpl_mct -I/glade/apps/opt/netcdf/4.2/pgi/12.5/include -I/ncar/opt/pgi/12.5.0/linux86-64/2012/mpi/mpich/include -DNO_SHR_VMATH -DSEQ_MCT -DFORTRANUNDERSCORE -DCO2A -DMAXPATCH_PFT=numpft+1 -DLSMLAT=1 -DLSMLON=1 -DCOUP_DOM -DPLON=128 -DPLAT=64 -DPLEV=26 -DPCNST=3 -DPCOLS=16 -DPTRM=42 -DPTRN=42 -DPTRK=42 -DCCSMCOUPLED -Dcoupled -Dncdf -DNCAT=1 -DNXGLOB=128 -DNYGLOB=64 -DNTR_AERO=0 -DBLCKX=2 -DBLCKY=64 -DMXBLCKS=1 -D_USEBOX -D_NETCDF -DNO_MPI2 -DSPMD -DLINUX -DNO_R16 -fast -L/ncar/opt/pgi/12.5.0/linux86-64/2012/mpi/mpich/lib -I/ncar/opt/pgi/12.5.0/linux86-64/2012/mpi/mpich/include /glade/u/home/yzou/ccsm4_0/models/utils/timing/GPTLget_memusage.cPGC-F-0206-Can't find include file sys/resource.h (/glade/u/home/yzou/ccsm4_0/models/utils/timing/GPTLget_memusage.c: 13)
PGC/x86-64 Linux 12.5-0: compilation aborted
gmake: *** [GPTLget_memusage.o] Error 2 I will highly appreciate any suggestion.Best regards,Danny
 

mai

Member
Are you submitting a batch job to build the executable? If so, try running the build script interactively from the command of line one of the yellowstone front-end nodes (ysloginN, N=1,2,3,4,5,6).
 

paekh@uci_edu

New Member
Hi Mai, Thanks so much for your comments.I submitted a batch jobs before, tried interactively as you suggested and the problem was solved.However, I encountered another issue on netcdf as :........................CICE_InitMod.o: In function `.C2_3416':CICE_InitMod.F90:(.data+0x134): undefined reference to `typesizes_'CICE_InitMod.F90:(.data+0x13c): undefined reference to `netcdf_'CICE_RunMod.o: In function `.C4_292':CICE_RunMod.F90:(.data+0x240): undefined reference to `typesizes_'CICE_RunMod.F90:(.data+0x248): undefined reference to `netcdf_'SNICARMod.o: In function `snicarmod_snowoptics_init_':/glade/u/home/yzou/ccsm4_0/models/lnd/clm/src/biogeophys/SNICARMod.F90:1377: undefined reference to `nf_open_'/glade/u/home/yzou/ccsm4_0/models/lnd/clm/src/biogeophys/SNICARMod.F90:1377: undefined reference to `nf_inq_varid_'/glade/u/home/yzou/ccsm4_0/models/lnd/clm/src/biogeophys/SNICARMod.F90:1377: undefined reference to `nf_get_var_double_'/glade/u/home/yzou/ccsm4_0/models/lnd/clm/src/biogeophys/SNICARMod.F90:1377: undefined reference to `nf_inq_varid_'........................... Attached is make file.    
 

erik

Erik Kluzek
CSEG and Liaisons
Staff member
It looks like to me that you need to change your build so that the -DFORTRANUNDERSCORE is removed from USER_CPPDEFS in the Makefile. This is using the CAM standalone makefile, so I'll ask some CAM folks to look at this.
 

paekh@uci_edu

New Member
Hi erik, Thanks so much for your suggestion.The problem was solved with the addition of -lnetcdff as suggested by other posts. I also tried to remove -DFORTRANUNDERSCORE but it led to another error as :PGC-F-0249-#error --  "Unrecognized Fortran-mangle type" (/glade/u/home/yzou/ccsm4_0/models/csm_share/shr/shr_isnan.c: 21)PGC/x86-64 Linux 12.5-0: compilation abortedgmake: *** [shr_isnan.o] Error 2    
 

paekh@uci_edu

New Member
Now I have another issue which might be related to mpi as : CalcWorkPerBlock: Total blocks:    64 Ice blocks:    64 IceFree blocks:     0 Land blocks:     0  Processors (X x Y) =    1 x    1 Active processors:             1(shr_sys_abort) ERROR:  ice: no. blocks exceed max: increase max to           64(shr_sys_abort) WARNING: calling shr_mpi_abort() and stoppingp0_14382:  p4_error: interrupt SIGSEGV: 11/ncar/opt/pgi/12.5.0/linux86-64/2012/mpi/mpich/bin/mpirun.ch_p4: line 243: 14382 Segmentation fault      (core dumped) /glade/p/work/yzou/cam4/output/amo_x2_tot/AMOx2_TOT1/bld/cam -p4pg /glade/p/work/yzou/cam4/output/amo_x2_tot/AMOx2_TOT1/PI14325 -p4wd /glade/p/work/yzou/cam4/output/amo_x2_tot/AMOx2_TOT1 Thanks in advance for your support. 
 

eaton

CSEG and Liaisons
My guess is that the -ntasks argument to CAM's configure and the number of tasks used in the run are inconsistent.  These numbers should be the same. 
 

paekh@uci_edu

New Member
Hi eaton, Thanks for your reply.ntasks=64 in configure as : setenv INC_NETCDF /glade/apps/opt/netcdf/4.2/pgi/12.5/includesetenv LIB_NETCDF /glade/apps/opt/netcdf/4.2/pgi/12.5/libsetenv PGI /ncar/opt/pgi/12.5.0setenv INC_MPI ${PGI}/linux86-64/2012/mpi/mpich/includesetenv LIB_MPI ${PGI}/linux86-64/2012/mpi/mpich/lib$cfgdir/configure -v -dyn eul -res 64x128 -spmd -nosmp -ntasks 64 -nc_inc $INC_NETCDF  -nc_lib $LIB_NETCDF -mpi_inc $INC_MPI -mpi_lib $LIB_MPI  -fc "pgf90 -mp -Minfo=all -lnetcdff" -cflags "-L$LIB_MPI -I$INC_MPI" -cc "pgcc  -mp -Minfo=all"  -fflags "-L$LIB_MPI -I$INC_MPI"   || exit 1 In our local machine, we can set tasks such as "mpirun -np 64 .....",however, I have no idea how to set tasks on yellowstone. my script just includes :#BSUB -n 64                  # number of tasks in job         #BSUB -R "span[ptile=16]"    # run 16 MPI tasks per node...mpirun.lsf $blddir/cam < namelist >&! $logfile Attached is log file. 
 

paekh@uci_edu

New Member
I added -nthreads 1 in config as :$cfgdir/configure -v -dyn eul -res 64x128 -spmd -nosmp -ntasks 64 -nthreads 1 -nc_inc $INC_NETCDF  -nc_lib $LIB_NETCDF -mpi_inc $INC_MPI -mpi_lib $LIB_MPI  -fc "pgf90 -mp -Minfo=all -lnetcdff" -cflags "-L$LIB_MPI -I$INC_MPI" -cc "pgcc  -mp -Minfo=all"  -fflags "-L$LIB_MPI -I$INC_MPI" -test The above problem is solved but I have another error as :  16:PGFIO-F-209/OPEN/unit=10/'OLD' specified for file which does not exist.  16: File name = drv_in  16: In source file /glade/p/work/yzou/ccsm4_0/models/drv/driver/seq_io_mod.F90, at line number 164 drv_in is in run directory. Attached is a log file. 
 

eaton

CSEG and Liaisons
You can see from the cam_log.txt file that multiple copies of the job are running using 1 task each.  Don't know how that would happen.  The BSUB commands and using mpirun.lsf for the job launcher look correct.The main thing I see which is a problem is that you are trying to use the pgcc and pgf90 compilers directly, and supplying mpi and netcdf paths manually.  This procedure is not robust.  On yellowstone you should be using the mpif90 and mpicc compiler wrappers, and not specify anything about mpi or netcdf yourself.  The compiler wrappers on yellowstone know how to link with mpi and netcdf and that's the only robust way to do it.  In order to use pgi compilers you need to use the module mechanism on yellowstone.  Using such an old CAM version complicates the build because that old build didn't have the options to make use of the compiler wrappers.  So my first recommendation is to use the latest CESM release which supports the cam4 physics package as an option.  If there are other reasons you need to stick with CCSM4 then you'll need to modify the Makefile produced by CAM's configure to get it to use the compiler wrappers.
 
Top