Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

CESM2.1.3 build failed for B1850,about lapack and blas

Besuyi

Besuyi
Member
Dear all,
I encountered some problems when building the case B1850 on HPC, but the case X has been built and ran sucessfully although it had some warnings during the build. And I have already spent almost two weeks trying to make B1850 work without success.
After running the following for the first time
./create_newcase --case cesmB --res f19_g17 --compset B1850 --mach ahren
./case.setup
./case.build
it build failed and I cat the cesm.bldlog.xxx find there are almost like this
1625069651407.png
After seeking for related posts on the forum, I supposed there are something wrong with my lapack and blas set.Thus I modified .bashrc with
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/g1/app/mathlib/netcdf/4.4.0/intel/lib/:/g1/app/mathlib/lapack/3.4.2/intel/lib:/g1/app/mathlib/blas/3.8.0/intel
and I appended to SLIBS like this
<append> -L/g1/app/mathlib/lapack/3.4.2/intel/lib -llapack -L/g1/app/mathlib/blas/3.8.0/intel blas_LINUX.a </append>
the above operation seems to work cause there were no such 'undifined reference to xxx' in the cesm.bldlog file,but still build failed with error "ifort:error #10236:File not found 'blas_LINUX.a' ". Because the path /g1/app/mathlib/blas/3.8.0/intel of HPC only has a blas_LINUX.a file and no libblas.a,so I installed lapack-3.10.0 and blas-3.10.0 under my work path /g8/JOB_TMP/zhangh/besyi/software/ .
the current export is :
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/g1/app/mathlib/netcdf/4.4.0/intel/lib/:/g8/JOB_TMP/zhangh/besyi/software/lapack-3.10.0/:/g8/JOB_TMP/zhangh/besyi/software/BLAS-3.10.0/
and SLIBS:
# the SLIBS part of config_compilers.xml
<SLIBS>
<append> -L/g8/JOB_TMP/zhangh/besyi/software/lapack-3.10.0/ -llapack -L/g8/JOB_TMP/zhangh/besyi/software/BLAS-3.10.0/ -lblas </append>
<append MPILIB="mpich"> -mkl=cluster </append>
<append MPILIB="mpich2"> -mkl=cluster </append>
<append MPILIB="mvapich"> -mkl=cluster </append>
<append MPILIB="mvapich2"> -mkl=cluster </append>
<append MPILIB="mpt"> -mkl=cluster </append>
<append MPILIB="openmpi"> -mkl=cluster </append>
<append MPILIB="impi"> -mkl=cluster </append>
<append MPILIB="mpi-serial"> -mkl </append>
</SLIBS>
Then I build B1850,hoping to fix out it,but it doesn't.
ERROR: BUILD FAIL: buildexe failed, cat /g8/JOB_TMP/zhangh/besyi/software/cesm2_1_3/scratch/cesmB/bld/cesm.bldlog.210630-143855
All bldlog files included the warnings showed on the screen have been attached.
Anyone can help us out here? Thanks a lot in advance.
 

Attachments

  • screenshot for build failed.png
    screenshot for build failed.png
    227 KB · Views: 23
  • bldlog.zip
    78.6 KB · Views: 7

sacks

Bill Sacks
CSEG and Liaisons
Staff member
I am moving this to the forum for porting issues to get attention from the appropriate people there.
 

jedwards

CSEG and Liaisons
Staff member
It looks like you've solved the problem with linking balc and lapack - the new error is about linking netcdf. Look in config_compilers.xml for examples of how to do that. I see you have /g1/app/mathlib/netcdf/4.4.0/intel/ in the pio build log, but I don't see it in the cesm.bldlog
 

Besuyi

Besuyi
Member
It looks like you've solved the problem with linking balc and lapack - the new error is about linking netcdf. Look in config_compilers.xml for examples of how to do that. I see you have /g1/app/mathlib/netcdf/4.4.0/intel/ in the pio build log, but I don't see it in the cesm.bldlog
Hi, jedwards, thank you for your reply. All my machines xml files and .bashrc are in attachment. I can't figure out what happened after linking lapack and blas,because I didn‘t change the settings related to netcdf. The cesm and pio build log file before lapack and blas modification are also attached here.
 

Attachments

  • cesm.bldlog.txt
    61 KB · Views: 2
  • pio.bldlog.txt
    70.8 KB · Views: 2
  • xml&bashrc.zip
    3.3 KB · Views: 3

Besuyi

Besuyi
Member
It looks like you've solved the problem with linking balc and lapack - the new error is about linking netcdf. Look in config_compilers.xml for examples of how to do that. I see you have /g1/app/mathlib/netcdf/4.4.0/intel/ in the pio build log, but I don't see it in the cesm.bldlog
Hi, jedwards, last time I didn't append netcdf lib in SLIB, just export in .bashrc, and the case X can work. Also I saw Does CESM2 supports netcdf shared libraries?.
So this time thanks for your reminding, I added netcdf lib to SLIBS, which is
<SLIBS>
<append> -L/g8/JOB_TMP/zhangh/besyi/software/lapack-3.10.0/ -llapack -L/g8/JOB_TMP/zhangh/besyi/software/BLAS-3.10.0/ -lblas </append>
<append> -L/g1/app/mathlib/netcdf/4.4.0/intel/lib -lnetcdff -lnetcdf </append>
<append MPILIB="mpich"> -mkl=cluster </append>
<append MPILIB="mpich2"> -mkl=cluster </append>
<append MPILIB="mvapich"> -mkl=cluster </append>
<append MPILIB="mvapich2"> -mkl=cluster </append>
<append MPILIB="mpt"> -mkl=cluster </append>
<append MPILIB="openmpi"> -mkl=cluster </append>
<append MPILIB="impi"> -mkl=cluster </append>
<append MPILIB="mpi-serial"> -mkl </append>
</SLIBS>
Perhaps in this case both export and SLIBS are required.
Then I run ./case.build --skip-provenance-check, although it still failed, we can see that the errors are much less,also the bldlog files are attached. And I guess these errors were cased by different FORTRAN compilers between my machine(ifort) and newly installed lapack and blas, I used gfortan for latter two.
1625107611819.png
So I decide to reinstall lapack and blas by ifort, then we will check the result.
Good luck!
 

Attachments

  • bldlog_new.zip
    70.5 KB · Views: 1

Besuyi

Besuyi
Member
I used ifort to recompile and install lapack and blas. Here is the latest build error. Also I attached the cesm.bldlog and pio.bldlog, could you help me point out why it happened?
1625133482782.png
 

Attachments

  • cesm.bldlog.210701-075708.txt
    33.6 KB · Views: 3
  • pio.bldlog.210701-075708.txt
    70.5 KB · Views: 2

jedwards

CSEG and Liaisons
Staff member
These undefined symbols are provided by the intel mkl library. Try adding -mkl to your compile flags.
 

Besuyi

Besuyi
Member
These undefined symbols are provided by the intel mkl library. Try adding -mkl to your compile flags.
You are right! sir. Today I did following 4 operations
1. added LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/compiler/intel/composer_xe_2017.2.174/mkl/lib/intel64/ to export
2. added <MPI_PATH>/opt/mpi/intelmpi/2017.2.174/intel64/</MPI_PATH> to compilers.xml
3. appened -L/opt/compiler/intel/composer_xe_2017.2.174/mkl/lib/intel64/ -lmkl_rt to <SLIBS> of compilers.xml
4. appended -DNO_MPIMOD to <CPPDEFS>
Then ' -L/opt/compiler/intel/composer_xe_2017.2.174/mkl/lib/intel64/ -lmkl_rt -L/opt/mpi/intelmpi/2017.2.174/intel64//lib -lmpi ' appeared on cesm.bldlog. It build sucessfully and now is running.
But I am not sure whether the fourth operation worked on this issue,cause I don't know what it does.Should I remove it? I'll remove it and see.
Besides, under the path of /opt/compiler/intel/composer_xe_2017.2.174/mkl/lib/intel64/, '-lmkl_rt' is file 'libmkl_rt.so' not 'libmkl_rt.a', is that right?
What's the point is I don't know which one among 2-4 is critical.
I sincerely appreciate for your help, have a nice day!
 

jedwards

CSEG and Liaisons
Staff member
I don't think that you needed any of those steps 2-4, you just need to add -mkl to your compiler flags in config_compilers.xml.
 
Top