Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Segmentation Fault at run

Hi,

So, I was finally able to successfully built the X compset of the CESM. I changed the system configuration a little bit.
Intel Compilers (ifort, icc, icpc) version 12.1.0
OpenMPI version 1.4.3 (compiled with and linked to intel compilers)
netCDF version 4.1.3 w/o HDF5 (compiled with intel compilers)
Linux/Ubuntu
4 intel processors with 4 GB of ram (we want to link to 3 other identical systems)
based configuration off of generic_linux_intel

Compilation went pretty smoothly with 3 exceptions. First the "make" command on my system is 'make' (GNU Make 3.81) not 'gmake' so I converted all the gmake commands to "make" using, grep -lr -e 'gmake' * | xargs sed -i 's/gmake/make/g'

Second was that there was a lot of "undefined reference" errors (hundreds) while building ccsm.buildexe.csh. I added the "-shared nounderscore" flag, and when that didn't work I added the '-shared' tag to the Makefile and '-fPIC' to the Macros file.

After these two changes, the model builds successfully, but when I try to execute ccsm.exe (mpirun -n 4 ./ccsm.exe >& ccsm.log), I get the following error.

--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 23066 on node catalina3 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
There were no other errors or feedback, although while going through logs I did find a file called wrf_error_fatal.F90, could this be an issue? I also found this error in the PIO build log...

/opt/openmpi-1.4.3/bin/mpif90 -c -fPIC -assume nounderscore -I/opt/openmpi-1.4.3/include -DSYSLINUX -DLINUX -DCPRUNKNOWNCPR -DSPMD -DHAV
E_MPI -DUSEMPIIO -D_NETCDF -D_NOPNETCDF -D_NOUSEMCT -D_USEBOX -I/opt/openmpi-1.4.3/include -I/usr/local/include piolib_mod.F90
piolib_mod.F90(429): warning #6843: A dummy argument with an explicit INTENT(OUT) declaration is not given an explicit value. [IODESC]
subroutine PIO_initdecomp_dof_dof(iosystem,basepiotype,dims,compdof,iodesc,iodof)
----------------------------------------------------------------------^
piolib_mod.F90(394): warning #6843: A dummy argument with an explicit INTENT(OUT) declaration is not given an explicit value. [IODESC]
subroutine PIO_initdecomp_bc(iosystem,basepiotype,dims,compstart,compcount,iodesc,iostart,iocount)
-----------------------------------------------------------------------------^
piolib_mod.F90(1797): warning #6843: A dummy argument with an explicit INTENT(OUT) declaration is not given an explicit value. [IOPROC]
subroutine pio_recommend_iotasks(comm, ioproc, numiotasks, miniotasks, maxiotasks )
-----------------------------------------^
piolib_mod.F90(1797): warning #6843: A dummy argument with an explicit INTENT(OUT) declaration is not given an explicit value. [NUMIOTAS
KS]
subroutine pio_recommend_iotasks(comm, ioproc, numiotasks, miniotasks, maxiotasks )
-------------------------------------------------^

If you think any other information or feedback would be helpful please let me know and I'll post it ASAP.

I think it might have something to do with the shared and fPIC flags that I used, but if it is then I'm not sure how to how to resolve the undefined reference errors.

Any help or advice you can give will be greatly appreciated.

Thank you,
Aaron
 
I've figured out the debugger, and here is some additional information about the segmentation fault.

Temporary breakpoint 1 at 0x7ffff7d4a934: file /home/atp42/cesm1_0_3/models/drv/driver/ccsm_driver.F90, line 47.
Starting program: /home/atp42/scratch/03test/run/ccsm.exe

Program received signal SIGSEGV, Segmentation fault.
0x000000000001 in ?? ()

From what I understand this means that the model thinks that a function being called at or around line 47 in ccsm_driver.F90 is at an invalid memory location.

What I'm not sure is why or how to stop it. I'm worried that editing the offending function might break the model, or cause more errors down the line.

Can anyone suggest a course of action?

Thank you,
Aaron
 
Okay, so I think part of the problem might be the shared/fpic flags so I removed them I and get the following error. I've tried the -assume nounderscore/no2underscore flags, and that changes the number of errors somewhat, but not completely.

Thank you for your help.

-------------------------------------------------------------------------
Building a single executable version of CCSM
-------------------------------------------------------------------------
cat: Srcfiles: No such file or directory
/home/atp42/48test/Tools/mkSrcfiles > /home/atp42/scratch/48test/ccsm/obj/Srcfiles
cp -f /home/atp42/scratch/48test/ccsm/obj/Filepath /home/atp42/scratch/48test/ccsm/obj/Deppath
/home/atp42/48test/Tools/mkDepends Deppath Srcfiles > /home/atp42/scratch/48test/ccsm/obj/Depends
/opt/openmpi-1.4.3/bin/mpif90 -c -I./ -I/usr/local/include -I/usr/local/include -I/opt/openmpi-1.4.3/include -I. -I/home/atp42/48
test/SourceMods/src.drv -I/home/atp42/cesm1_0_3/models/drv/driver -I/home/atp42/scratch/48test/lib/include -DMCT_INTERFACE -DHAV
E_MPI -DGLC_NEC_10 -DLINUX -DSEQ_ -DFORTRANUNDERSCORE -DNO_R16 -DNO_SHR_VMATH -g -fp-model precise -assume byterecl -ftz -traceba
ck -I/opt/openmpi-1.4.3/include -I/opt/openmpi-1.4.3/lib -O0 -FR /home/atp42/cesm1_0_3/models/drv/driver/seq_diag_mct.F90
.
.
.
/opt/openmpi-1.4.3/bin/mpif90 -c -I./ -I/usr/local/include -I/usr/local/include -I/opt/openmpi-1.4.3/include -I. -I/home/atp42/48
test/SourceMods/src.drv -I/home/atp42/cesm1_0_3/models/drv/driver -I/home/atp42/scratch/48test/lib/include -DMCT_INTERFACE -DHAV
E_MPI -DGLC_NEC_10 -DLINUX -DSEQ_ -DFORTRANUNDERSCORE -DNO_R16 -DNO_SHR_VMATH -g -fp-model precise -assume byterecl -ftz -traceba
ck -I/opt/openmpi-1.4.3/include -I/opt/openmpi-1.4.3/lib -O0 -FR /home/atp42/cesm1_0_3/models/drv/driver/mrg_x2g_mct.F90
/opt/openmpi-1.4.3/bin/mpif90 -c -I./ -I/usr/local/include -I/usr/local/include -I/opt/openmpi-1.4.3/include -I. -I/home/atp42/48
test/SourceMods/src.drv -I/home/atp42/cesm1_0_3/models/drv/driver -I/home/atp42/scratch/48test/lib/include -DMCT_INTERFACE -DHAV
E_MPI -DGLC_NEC_10 -DLINUX -DSEQ_ -DFORTRANUNDERSCORE -DNO_R16 -DNO_SHR_VMATH -g -fp-model precise -assume byterecl -ftz -traceba
ck -I/opt/openmpi-1.4.3/include -I/opt/openmpi-1.4.3/lib -O0 -FR /home/atp42/cesm1_0_3/models/drv/driver/mrg_x2i_mct.F90
/opt/openmpi-1.4.3/bin/mpif90 -c -I./ -I/usr/local/include -I/usr/local/include -I/opt/openmpi-1.4.3/include -I. -I/home/atp42/48
test/SourceMods/src.drv -I/home/atp42/cesm1_0_3/models/drv/driver -I/home/atp42/scratch/48test/lib/include -DMCT_INTERFACE -DHAV
E_MPI -DGLC_NEC_10 -DLINUX -DSEQ_ -DFORTRANUNDERSCORE -DNO_R16 -DNO_SHR_VMATH -g -fp-model precise -assume byterecl -ftz -traceba
ck -I/opt/openmpi-1.4.3/include -I/opt/openmpi-1.4.3/lib -O0 -FR /home/atp42/cesm1_0_3/models/drv/driver/ccsm_comp_mod.F90
/opt/openmpi-1.4.3/bin/mpif90 -c -I./ -I/usr/local/include -I/usr/local/include -I/opt/openmpi-1.4.3/include -I. -I/home/atp42/48
test/SourceMods/src.drv -I/home/atp42/cesm1_0_3/models/drv/driver -I/home/atp42/scratch/48test/lib/include -DMCT_INTERFACE -DHAV
E_MPI -DGLC_NEC_10 -DLINUX -DSEQ_ -DFORTRANUNDERSCORE -DNO_R16 -DNO_SHR_VMATH -g -fp-model precise -assume byterecl -ftz -traceba
ck -I/opt/openmpi-1.4.3/include -I/opt/openmpi-1.4.3/lib -O0 -FR /home/atp42/cesm1_0_3/models/drv/driver/ccsm_driver.F90
/opt/openmpi-1.4.3/bin/mpif90 -o /home/atp42/scratch/48test/run/ccsm.exe ccsm_comp_mod.o ccsm_driver.o map_atmatm_mct.o map_atmice
_mct.o map_atmlnd_mct.o map_atmocn_mct.o map_glcglc_mct.o map_iceice_mct.o map_iceocn_mct.o map_lndlnd_mct.o map_ocnocn_mct.o map_
rofocn_mct.o map_rofrof_mct.o map_snoglc_mct.o map_snosno_mct.o mrg_x2a_mct.o mrg_x2g_mct.o mrg_x2i_mct.o mrg_x2l_mct.o mrg_x2o_mc
t.o mrg_x2s_mct.o seq_avdata_mod.o seq_diag_mct.o seq_domain_mct.o seq_flux_mct.o seq_frac_mct.o seq_hist_mod.o seq_rearr_mod.o se
q_rest_mod.o -L/home/atp42/scratch/48test/lib -latm -llnd -lice -locn -lglc -L/home/atp42/scratch/48test/lib -lcsm_share -lmct -lm
peu -lpio -L/usr/local/lib -lnetcdf -L/opt/openmpi-1.4.3/lib -lmpi -L/usr/lib -L/usr/lib64 -L/usr/local/lib
/home/atp42/scratch/48test/lib/libcsm_share.a(shr_mct_mod.o): In function `shr_mct_smatreadnc':
/home/atp42/cesm1_0_3/models/csm_share/shr/shr_mct_mod.F90:105: undefined reference to `nf_open_'
/home/atp42/cesm1_0_3/models/csm_share/shr/shr_mct_mod.F90:107: undefined reference to `nf_strerror_'
/home/atp42/cesm1_0_3/models/csm_share/shr/shr_mct_mod.F90:112: undefined reference to `nf_inq_dimid_'
/home/atp42/cesm1_0_3/models/csm_share/shr/shr_mct_mod.F90:113: undefined reference to `nf_inq_dimlen_'
/home/atp42/cesm1_0_3/models/csm_share/shr/shr_mct_mod.F90:114: undefined reference to `nf_inq_dimid_'
/home/atp42/cesm1_0_3/models/csm_share/shr/shr_mct_mod.F90:115: undefined reference to `nf_inq_dimlen_'
/home/atp42/cesm1_0_3/models/csm_share/shr/shr_mct_mod.F90:116: undefined reference to `nf_inq_dimid_'
/home/atp42/cesm1_0_3/models/csm_share/shr/shr_mct_mod.F90:117: undefined reference to `nf_inq_dimlen_'
/home/atp42/cesm1_0_3/models/csm_share/shr/shr_mct_mod.F90:140: undefined reference to `nf_inq_varid_'
/home/atp42/cesm1_0_3/models/csm_share/shr/shr_mct_mod.F90:141: undefined reference to `nf_get_var_double_'
/home/atp42/cesm1_0_3/models/csm_share/shr/shr_mct_mod.F90:142: undefined reference to `nf_strerror_'
/home/atp42/cesm1_0_3/models/csm_share/shr/shr_mct_mod.F90:154: undefined reference to `nf_inq_varid_'
/home/atp42/cesm1_0_3/models/csm_share/shr/shr_mct_mod.F90:155: undefined reference to `nf_get_var_int_'
/home/atp42/cesm1_0_3/models/csm_share/shr/shr_mct_mod.F90:156: undefined reference to `nf_strerror_'
/home/atp42/cesm1_0_3/models/csm_share/shr/shr_mct_mod.F90:165: undefined reference to `nf_inq_varid_'
/home/atp42/cesm1_0_3/models/csm_share/shr/shr_mct_mod.F90:166: undefined reference to `nf_get_var_int_'
/home/atp42/cesm1_0_3/models/csm_share/shr/shr_mct_mod.F90:167: undefined reference to `nf_strerror_'
/home/atp42/cesm1_0_3/models/csm_share/shr/shr_mct_mod.F90:174: undefined reference to `nf_close_'
.
.
.
/home/atp42/scratch/48test/lib/libpio.a(pionfwrite_mod.o): In function `write_nfdarray_double':
/home/atp42/scratch/48test/pio/pionfwrite_mod.F90.in:90: undefined reference to `netcdf_mp_nf90_put_var_1d_eightbytereal_'
/home/atp42/scratch/48test/pio/pionfwrite_mod.F90.in:112: undefined reference to `netcdf_mp_nf90_inquire_variable_'
/home/atp42/scratch/48test/pio/pionfwrite_mod.F90.in:152: undefined reference to `netcdf_mp_nf90_put_var_1d_eightbytereal_'
/home/atp42/scratch/48test/pio/pionfwrite_mod.F90.in:187: undefined reference to `netcdf_mp_nf90_put_var_1d_eightbytereal_'
make: *** [/home/atp42/scratch/48test/run/ccsm.exe] Error 1
 
I have no idea how to resolve this undefined reference error, I've tried updating my LD_LIBRARY_PATH, and I've tried it with 3 different compiler/MPI combinations (although one of these combinations is ifort/icc version 12).

If you have any ideas on how to resolve this error, please give any and all suggestions you can think of.

I've run out of ideas and I'm stuck on this error.

Thank you!
 
Top