Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

CESM1.2 build problem on cheyenne, relocation truncated to fit

tcraig

Member
This is a follow up to https://xenforo.cgd.ucar.edu/cesm/threads/cesm-1-2-2-1-cesm_setup-module-issue-on-cheyenne.4680/.

I am trying to port a modified version of CESM1.2 to cheyenne. I am just starting with an X compset configuration to do the initial port. I am running into the same problems as others have on the load step,

mpif90 -o /glade/scratch/tcraig/Xnna01/bld/cesm.exe ccsm_comp_mod.o ccsm_driver.o mrg_mod.o seq_avdata_mod.o seq_diag_mct.o seq_domain_mct.o seq_flux_mct.o seq_frac_mct.o seq_hist_mod.o seq_map_esmf.o seq_map_mod.o seq_mctext_mod.o seq_rest_mod.o -L/glade/scratch/tcraig/Xnna01/bld/lib -latm -llnd -lice -locn -lglc -lrof -L/glade/scratch/tcraig/Xnna01/bld/lib -lcsm_share -L/glade/scratch/tcraig/Xnna01/bld/mct/mct -lmct -L/glade/scratch/tcraig/Xnna01/bld/mct/mpeu -lmpeu -L/glade/scratch/tcraig/Xnna01/bld/pio -lpio -lgptl -L/glade/u/apps/ch/opt/netcdf/4.7.3/intel/18.0.5/lib -lnetcdf -lnetcdff -L/glade/u/apps/ch/opt/pnetcdf/1.12.1/mpt/2.19/intel/18.0.5/lib -lpnetcdf
/usr/lib64/gcc/x86_64-suse-linux/4.8/../../../../x86_64-suse-linux/bin/ld: ccsm_comp_mod.o: in function `ccsm_comp_mod_mp_ccsm_run_':
ccsm_comp_mod.F90:(.text+0x28): relocation truncated to fit: R_X86_64_32 against symbol `seq_avdata_mod_mp_infodata_' defined in COMMON section in seq_avdata_mod.o
/usr/lib64/gcc/x86_64-suse-linux/4.8/../../../../x86_64-suse-linux/bin/ld: ccsm_comp_mod.F90:(.text+0x446): relocation truncated to fit: R_X86_64_32S against symbol `ccsm_comp_mod_mp_begstep_' defined in COMMON section in ccsm_comp_mod.o

I have tried many things to fix this.

- When I add -mcmodel=medium, the code builds but the executable is not runable.

- I am using pio1.5.7. It's not clear that is causing problems for me, but I have tried to use pio1.7.3 and run into a build issue that seems to be related to the code generation step done during predist.
mpif90 -c -DLINUX -DMCT_INTERFACE -DHAVE_MPI -DFORTRANUNDERSCORE -DNO_R16 -DHAVE_F2008_CONTIGUOUS -DLINUX -DCPRINTEL -DHAVE_SLASHPROC -DSPMD -DHAVE_MPI -DUSEMPIIO -DSYSLINUX -D_NETCDF -I/glade/u/apps/ch/opt/netcdf/4.6.3/intel/17.0.1//include -D_PNETCDF -DTIMING -O2 -fp-model precise -convert big_endian -assume byterecl -ftz -traceback -free -I. -I/glade/u/apps/ch/opt/netcdf/4.6.3/intel/17.0.1//include -I/glade/u/apps/ch/opt/pnetcdf/1.11.1/mpt/2.19/intel/17.0.1//include -I/glade/scratch/tcraig/Xnna01/bld/mct/mct -I/glade/scratch/tcraig/Xnna01/bld/mct/mpeu -I/glade/scratch/tcraig/Xnna01/bld/pio -I/glade/work/tcraig/rasm_nna/models/utils/pio -I/glade/scratch/tcraig/Xnna01/bld/lib/include -I/glade/u/apps/ch/opt/netcdf/4.6.3/intel/17.0.1//include -I/glade/u/apps/ch/opt/pnetcdf/1.11.1/mpt/2.19/intel/17.0.1//include -I../timing pio_msg_getput_callbacks.F90
/glade/work/tcraig/rasm_nna/models/utils/pio/pio_msg_getput_callbacks.F90.in(57): error #6404: This name does not have a type, and must have an explicit type. [TYPETEXT]
if(itype == TYPETEXT) then
--------------^
/glade/work/tcraig/rasm_nna/models/utils/pio/pio_msg_getput_callbacks.F90.in(64): error #6404: This name does not have a type, and must have an explicit type. [TYPEREAL]
case (TYPEREAL)
--------------^
...

- I have tried a few different sets of modules based on earlier forum posts and those don't seem to have helped. I am currently using the following based on other recommendations,
module load ncarenv/1.3
module load intel/17.0.1
module load mkl
module load ncarcompilers/0.5.0
module load mpt/2.19
module load netcdf/4.6.3
module load pnetcdf/1.11.1

- I have this model running on a few other SGI ICE XA machines as well as Cray XC machines without a problem. It's not clear to me what is causing this problem. I don't undersand the "relocation truncated to fit" errors. Does anyone know what's going on? Is it just incompatible compiler/lib setups? Again, I think the code should be able to build and run on cheyenne because it works at several other places on similar architectures, it seems it's just figuring out what modules/flags I need to compile on cheyenne.
 

tcraig

Member
Thanks Chris. pio1.8.14 doesn't have a Makefile or a configure file. I'm not able to just plug it into my version of cesm1.2. I think in the earlier thread, this was also noted which is why I tried 1.7.3 (also referenced in the earlier thread). Is there something I can change fairly easily in the CESM scripts to get them to build pio1.8.14?

My sense is that cesm1.2 did run on cheyenne prior to an update last year. It seems like there must be an env variable or a compiler flag or something that should allow us to "recover" that capability. Does anyone really understand why the same hardware and more or less the same compilers/libs/etc cannot build and run this code while it worked before the upgrade?

Thanks for your help, very much appreciated.
 

jedwards

CSEG and Liaisons
Staff member
The issue is that older pio versions are not compatible with newer netcdf versions due to a change in the definition of certain netcdf constants like
NC_MAX_DIM_NAME I have fixed this in new tags of each of the pio major versions - if you were using something in the pio1.7 series than the latest tag in that series pio1.7.4 should build for you and provide a fix. But if your model also uses these netcdf variables then the pio build may not be the problem, your model will also need to be modified. The latest pio in 1.5 is pio 1.5 11
 

tcraig

Member
My X compset test just built and ran with 1.5.11. Thanks Jim! I will now move on to more complex cases and let you know if I run into any other issues. Again, many thanks.
 

tcraig

Member
I have been able to build and run a number of cases on cheyenne with pio 1.5.11 with my version of cesm1.2, so that is great.

Just a quick follow up question. Does pio1.5.11 require netcdf 4.6 or greater? Could I use pio1.5.11 with an older version of netcdf?

Just another quick question. Was my problem an issue in memory allocation in pio or an issue in compatibility with netcdf. I was not clear on that point. It sounds like maybe a bit of both, new netcdf versions resulted in large memory allocation in pio?
 
Top