Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

CLM4.5CN compile/run error

We are trying to use CLM4.5CN in our study within CESM1.2.2. Here are some of the information of my case:   create_newcase -case CLM4.5cn_test -res f19_f19 -mach userdefined -compset ICRUCLM45   CLM_CONFIG_OPTS = -phys clm4_5 -bgc cn   CLM_BLDNML_OPTS = -megan   MPI_PATH = /opt/share/mpich2-1.2rc3   NETCDF_PATH = /opt/share/netcdf-4.3.2.intel CLM4CN cases can be successfully ran (compset = ICRUCN). But when we are trying to build a case with CLM4.5, we came across an error saying:  undefined reference to 'dgbsv'We found some websites useful to solve this problem:https://bb.cgd.ucar.edu/banddiagonalmodf90210-undefined-reference-dgbsvhttps://software.intel.com/en-us/articles/intel-mkl-link-line-advisor We tried to link LAPACK/BLAS libraries by adding these in Macros file:   MKLROOT=/opt/share/intel/composer_xe_2013_sp1.3.174/mkl   SLIBS+=-L${MKLROOT}/compiler/lib/intel64/libiomp5 -liomp5 -L${MKLROOT}/lib/intel64/libmkl_blas95_ilp64 -L${MKLROOT}/lib/intel64/libmkl_lapack95_ilp64 -L${MKLROOT}/lib/intel64 -lmkl_intel_ilp64 -lmkl_core -lmkl_intel_thread -lpthread -lm -L$(LIB_NETCDF) -lnetcdf -lnetcdff   FFLAGS:= -fp-model source -convert big_endian -assume byterecl -ftz -traceback -assume realloc_lhs -I${MKLROOT}/include/intel64/ilp64 -I${MKLROOT}/include We succeeded in compiling the case, but failed in the running process with no clear error message. In cesm.log file, it says:......Opened existing file /scratch/s1155053991/cesm_inputs/rof/rtm/initdata/rtmi.ICRUCLM45BGC.2000-01-01. R05_simyr2000_c130518.nc      131072forrtl: severe (174): SIGSEGV, segmentation fault occurredImage              PC                Routine            Line        Sourcecesm.exe           000000000270B6D9  Unknown               Unknown  Unknowncesm.exe           0000000002709FAE  Unknown               Unknown  Unknowncesm.exe           00000000026BE192  Unknown               Unknown  Unknowncesm.exe           0000000002655373  Unknown               Unknown  Unknowncesm.exe           000000000265EDBB  Unknown               Unknown  Unknownlibpthread.so.0    00002B89F4B07850  Unknown               Unknown  Unknownlibmkl_avx.so      00002B89FB774D68  Unknown               Unknown  Unknownlibmkl_core.so     00002B89F2D8E1DA  Unknown               Unknown  Unknownlibmkl_intel_thre  00002B89F412826B  Unknown               Unknown  Unknownlibmkl_core.so     00002B89F28DB1A1  Unknown               Unknown  Unknowncesm.exe           0000000001BC7152  banddiagonalmod_m         213  BandDiagonalMod.F90cesm.exe           00000000018D3449  soiltemperaturemo         742  SoilTemperatureMod.F90cesm.exe           0000000000EFAE6D  biogeophysics2mod         313  Biogeophysics2Mod.F90cesm.exe           000000000079690E  clm_driver_mp_clm         582  clm_driver.F90cesm.exe           0000000000708A12  lnd_comp_mct_mp_l         589  lnd_comp_mct.F90cesm.exe           000000000048FBFC  ccsm_comp_mod_mp_        3281  ccsm_comp_mod.F90cesm.exe           00000000004C1A2E  MAIN__                     91  ccsm_driver.F90cesm.exe           0000000000413586  Unknown               Unknown  Unknownlibc.so.6          00002B89F5B4CC36  Unknown               Unknown  Unknowncesm.exe           0000000000413479  Unknown               Unknown  Unknownforrtl: severe (174): SIGSEGV, segmentation fault occurredImage              PC                Routine            Line        Sourcecesm.exe           000000000270B6D9  Unknown               Unknown  Unknowncesm.exe           0000000002709FAE  Unknown               Unknown  Unknowncesm.exe           00000000026BE192  Unknown               Unknown  Unknowncesm.exe           0000000002655373  Unknown               Unknown  Unknowncesm.exe           000000000265EDBB  Unknown               Unknown  Unknownlibpthread.so.0    00002B3C35E65850  Unknown               Unknown  Unknownlibmkl_avx.so      00002B3C3C99FD68  Unknown               Unknown  Unknownlibmkl_core.so     00002B3C340EC1DA  Unknown               Unknown  Unknownlibmkl_intel_thre  00002B3C3548626B  Unknown               Unknown  Unknown libmkl_core.so     00002B3C33C391A1  Unknown               Unknown  Unknownlibmkl_intel_thre  00002B3C3548626B  Unknown               Unknown  Unknownlibmkl_core.so     00002B3C33C391A1  Unknown               Unknown  Unknowncesm.exe           0000000001BC7152  banddiagonalmod_m         213  BandDiagonalMod.F90cesm.exe           00000000018D3449  soiltemperaturemo         742  SoilTemperatureMod.F90cesm.exe           0000000000EFAE6D  biogeophysics2mod         313  Biogeophysics2Mod.F90cesm.exe           000000000079690E  clm_driver_mp_clm         582  clm_driver.F90cesm.exe           0000000000708A12  lnd_comp_mct_mp_l         589  lnd_comp_mct.F90cesm.exe           000000000048FBFC  ccsm_comp_mod_mp_        3281  ccsm_comp_mod.F90cesm.exe           00000000004C1A2E  MAIN__                     91  ccsm_driver.F90cesm.exe           0000000000413586  Unknown               Unknown  Unknownlibc.so.6          00002B3C36EAAC36  Unknown               Unknown  Unknowncesm.exe           0000000000413479  Unknown               Unknown  Unknown ====================================================================================   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES=   EXIT CODE: 174=   CLEANING UP REMAINING PROCESSES=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES =================================================================================== Seems like we are not linking the required libraries correctly. Anyone can help us out here? Thanks a lot in advance.  
 

erik

Erik Kluzek
CSEG and Liaisons
Staff member
OK, the first problem as you already discovered is that cesm1.2.2 *requires* linking in LAPACK for clm4_5. So good job in tracking that problem down and linking BLAS/LAPACK to the rest of the model. Some compilers auotmatically include BLAS/LAPACK, and only need you to add the library on the link step. This is what you've found with intel, where BLAS/LAPACK are included with it's standard MKL library. The numbers in the traceback are useful the lowest level you see line number information is... cesm.exe           0000000001BC7152  banddiagonalmod_m         213  BandDiagonalMod.F90
The "213" is the line number in models/lnd/clm/src/clm4_5/biogeophys/BandDiagonalMod.F90  which is the call to dgbsv         call dgbsv( n, kl, ku, 1, ab, m, ipiv, result, n, info )
So there's a problem in using dgbsv. The traceback below that point doesn't point to code lines, since it's code that's in the compiler library.
So as you suspect something is probably wrong with how you are linking it in. You've given explicit paths to very specific MKL libraries -- and I wonder if they are consistent with how you are compiling with the rest of the model? Looking at some of the example intel compilers for CESM1.2.2 in scripts/Machines I see that adding "-mkl" to ADD_SLIBS is done on some machines. Others may have to coordinate a specific MKL library. On yellowstone, we load the MKL library and the intel compiler (using the modules command) and the machine figures the rest out for us. Anyway, the link step has issues and you'll have to work on it until you get something to work. I would start by seeing if you can simplify it, and also look up any local documentation you have on MKL and the intel compiler for your machine (man pages, HTML, printed manuals, whatever you can find).If you can find an expert (or system support person) on the machine you are using and the intel compiler, that may help as well. Or find someone who's successfully linked LAPACK code on that machine as well.Good luck.
 

erik

Erik Kluzek
CSEG and Liaisons
Staff member
OK, the first problem as you already discovered is that cesm1.2.2 *requires* linking in LAPACK for clm4_5. So good job in tracking that problem down and linking BLAS/LAPACK to the rest of the model. Some compilers auotmatically include BLAS/LAPACK, and only need you to add the library on the link step. This is what you've found with intel, where BLAS/LAPACK are included with it's standard MKL library. The numbers in the traceback are useful the lowest level you see line number information is... cesm.exe           0000000001BC7152  banddiagonalmod_m         213  BandDiagonalMod.F90
The "213" is the line number in models/lnd/clm/src/clm4_5/biogeophys/BandDiagonalMod.F90  which is the call to dgbsv         call dgbsv( n, kl, ku, 1, ab, m, ipiv, result, n, info )
So there's a problem in using dgbsv. The traceback below that point doesn't point to code lines, since it's code that's in the compiler library.
So as you suspect something is probably wrong with how you are linking it in. You've given explicit paths to very specific MKL libraries -- and I wonder if they are consistent with how you are compiling with the rest of the model? Looking at some of the example intel compilers for CESM1.2.2 in scripts/Machines I see that adding "-mkl" to ADD_SLIBS is done on some machines. Others may have to coordinate a specific MKL library. On yellowstone, we load the MKL library and the intel compiler (using the modules command) and the machine figures the rest out for us. Anyway, the link step has issues and you'll have to work on it until you get something to work. I would start by seeing if you can simplify it, and also look up any local documentation you have on MKL and the intel compiler for your machine (man pages, HTML, printed manuals, whatever you can find).If you can find an expert (or system support person) on the machine you are using and the intel compiler, that may help as well. Or find someone who's successfully linked LAPACK code on that machine as well.Good luck.
 

erik

Erik Kluzek
CSEG and Liaisons
Staff member
OK, the first problem as you already discovered is that cesm1.2.2 *requires* linking in LAPACK for clm4_5. So good job in tracking that problem down and linking BLAS/LAPACK to the rest of the model. Some compilers auotmatically include BLAS/LAPACK, and only need you to add the library on the link step. This is what you've found with intel, where BLAS/LAPACK are included with it's standard MKL library. The numbers in the traceback are useful the lowest level you see line number information is... cesm.exe           0000000001BC7152  banddiagonalmod_m         213  BandDiagonalMod.F90
The "213" is the line number in models/lnd/clm/src/clm4_5/biogeophys/BandDiagonalMod.F90  which is the call to dgbsv         call dgbsv( n, kl, ku, 1, ab, m, ipiv, result, n, info )
So there's a problem in using dgbsv. The traceback below that point doesn't point to code lines, since it's code that's in the compiler library.
So as you suspect something is probably wrong with how you are linking it in. You've given explicit paths to very specific MKL libraries -- and I wonder if they are consistent with how you are compiling with the rest of the model? Looking at some of the example intel compilers for CESM1.2.2 in scripts/Machines I see that adding "-mkl" to ADD_SLIBS is done on some machines. Others may have to coordinate a specific MKL library. On yellowstone, we load the MKL library and the intel compiler (using the modules command) and the machine figures the rest out for us. Anyway, the link step has issues and you'll have to work on it until you get something to work. I would start by seeing if you can simplify it, and also look up any local documentation you have on MKL and the intel compiler for your machine (man pages, HTML, printed manuals, whatever you can find).If you can find an expert (or system support person) on the machine you are using and the intel compiler, that may help as well. Or find someone who's successfully linked LAPACK code on that machine as well.Good luck.
 
HI-
I hope you dont mind me bumping this thread. I have ran into this exact same error running cesm2_2_0. The model builds, but doesn't run and points to it being a compiler problem. I've followed @erik advice above and used intel mkl link line advisor and not found a solution.
What else could cause a segmentation fault ? I thought it might need a switch from a ilp64 to lp64 libraries but this hasnt worked either.
 
Top