Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

CESM2.3.alpha08b build failure (timeout waiting for CDEPS)

MarkR_UoLeeds

Mark Richardson
New Member
Hello,
I am building cesm2.3.alpha08b on our ARC4 hpc system (Dell cluster of Intel CPUs). This is a new error for me as I do not know why something would "timeout" while building:
cat /nobackup/earmgr/cesm2.3.alpha08b/FX2000_f19_f19_mg16/bld/CDEPS.bldlog.220325-101722

"Running cmake for CDEPS
ERROR: Timeout waiting for /nobackup/earmgr/cesm2.3.alpha08b/FX2000_f19_f19_mg16/bld/intel/intelmpi/nodebug/nothreads/nuopc/CDEPS/dwav/libdwav.a"

cmake is 3.15.1, intelmpi 2019.4.243 and ifort 19.0.4

Many thanks for any suggestions,
Mark
 

fischer

CSEG and Liaisons
Staff member
Hi Mark,

Is this a repeatable error? You should be using a resolution of f19_f19_mg17 instead of f19_f19_mg16. We no
longer test f19_f19_mg16.

Something you can try doing is doing a ./case.build --debug to get more logging information.

Chris
 

jedwards

CSEG and Liaisons
Staff member
Hi Mark,

Was your problem solved? I've found that that timeout occasionally happens and does not necessarily indicate a problem. Try simply running case.build again and see if it works.
 

MarkR_UoLeeds

Mark Richardson
New Member
hello Jim and chris,
thanks for you inputs. The problem moved on and now i wonder if i should open a fresh discussion.
I notice a message about CMake and did this:
export CIME_NO_CMAKE_MACRO=True
but wonder if I should correct that before worrying about downstream ...

However I changed the grid resolution so now I have FX2000_f19_f19_mg17. the case.build passes the CDEPS but now fails with lots of undefined reference to <several vars> e.g.

err=/home/home01/earmgr/cesm_prep/2.3.a08/FX2000_f19_f19_mg17/Tools/Makefile:8: "Variable MODEL is deprecated, please use COMP_NAME instead"
cat: Srcfiles: No such file or directory
/home/home01/earmgr/cesm_prep/2.3.a08/FX2000_f19_f19_mg17/Tools/Makefile:8: "Variable MODEL is deprecated, please use COMP_NAME instead"
/home/home01/earmgr/CESM/2.3.alpha08b/components/cmeps/cime_config/../mediator/med_io_mod.F90(126): warning #6843: A dummy argument with an explicit INTENT(OUT) declaration is not given an explicit value. [RC]
subroutine med_io_init(gcomp, rc)
--------------------------------^
/nobackup/earmgr/cesm2.3.alpha08b/FX2000_f19_f19_mg17/bld/lib//libatm.a(lapack_interfaces.o): In function `lapack_interfaces_mp_dgbsv_wrap_':
.
.
.
/nobackup/earmgr/cesm2.3.alpha08b/FX2000_f19_f19_mg17/bld/intel/intelmpi/nodebug/nothreads/nuopc/nuopc/esmf/lib//libclm.a(UrbBuildTempOleson2015Mod.o): In function `urbbuildtempoleson2015mod_mp_buildingtemperature_':
/home/home01/earmgr/CESM/2.3.alpha08b/components/clm/src/biogeophys/UrbBuildTempOleson2015Mod.F90:657: undefined reference to `dgesv_'



the bld directory has these logs
-rw-r----- 1 earmgr EAR 5212 Mar 30 14:30 gptl.bldlog.220330-143006
-rw-r----- 1 earmgr EAR 116131 Mar 30 14:31 mct.bldlog.220330-143006
-rw-r----- 1 earmgr EAR 101150 Mar 30 14:31 pio.bldlog.220330-143006
-rw-r----- 1 earmgr EAR 169036 Mar 30 14:32 csm_share.bldlog.220330-143006
-rw-r----- 1 earmgr EAR 2033 Mar 30 14:32 CDEPS.bldlog.220330-143006
drwxr-x--x 2 earmgr EAR 4096 Mar 30 14:36 ocn
-rw-r----- 1 earmgr EAR 1071872 Mar 30 14:43 lnd.bldlog.220330-143006
-rw-r----- 1 earmgr EAR 351 Mar 30 14:43 ocn.bldlog.220330-143006
-rw-r----- 1 earmgr EAR 31134 Mar 30 14:44 rof.bldlog.220330-143006
-rw-r----- 1 earmgr EAR 215699 Mar 30 14:45 ice.bldlog.220330-143006
drwxr-x--x 3 earmgr EAR 4096 Mar 30 14:49 lib
-rw-r----- 1 earmgr EAR 2365772 Mar 30 14:49 atm.bldlog.220330-143006
-rw-r----- 1 earmgr EAR 127846 Mar 30 14:50 cesm.bldlog.220330-143006

so now I probably have to go back to your guidance on what info to supply for a support question. This is our ARC4 hpc system at UoLeeds. using :

Currently Loaded Modulefiles:
1) licenses 3) intel/19.0.4 5) user 7) hdf5/1.8.21 9) fftw/3.3.8 11) cmake/3.15.1
2) sge 4) intelmpi/2019.4.243 6) mkl/2019.0 8) netcdf/4.6.3 10) esmf/8.2.0 12) python/3.7.4


and .cime/XML files for machine, compiler and batch. < these might be wrong now. although the local system paths are correct for 2.3.alpha08b; perhaps these need to be in the ccs_config directory. they were developed alongside 2.1.3 installation.

I did update the cpl7 and cmeps config_component.xml to inform about SGE.

Mark
 

jedwards

CSEG and Liaisons
Staff member
CESM in some cases has a dependency on lapack and blas libraries. This looks like that is the problem here.
The "deprecated" warnings can be safely ignored.
 

cemac-ccs

Chris Symonds
New Member
Hi Jim & Chris

I have started working on this as well, taking over from Mark, using the cesm2_3_beta08 tag. I have ensured that mkl is loaded and linked in the config_compilers.xml, trying linking with `-mkl` in the <SLIBS> and in the <LDFLAGS> blocks, and when that didn't work trying with `-Wl,--start-group -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -Wl,--end-group -lpthread` which is the usual way that mkl is linked on the ARC4 machine, trying this longer block in both SLIBS and LDFLAGS as well. I have had no success in either case.

Is there a different recommended way in which mkl should be linked to CESM?

This has occurred for me when trying to build the FX2000, F2000climo and B1850 compsets, however as far as I can see this was not an issue when building 2.1.3 or 2.2.0

Thanks

Chris
 

jedwards

CSEG and Liaisons
Staff member
Hi Chris, In cesm2_3_beta08 we have transitioned away from the config_compilers.xml file and are instead using cmake
flags in ccs_config/machines/cmake_macros I'll look at ways to make that more clear in the documentation.
 

cemac-ccs

Chris Symonds
New Member
I see. Thanks Jim, that seems to have solved the problem. Just running a confirmation coupled run to check, then will do the porting tests and look at a PR.

For creating a PR to bring the ARC4 port into ccs_config, should I be using the tagged release from beta08 (that's 0.0.16) as a base or should I be using head of trunk?
 

jedwards

CSEG and Liaisons
Staff member
I think that the PR should be okay with the tagged version as base. Updating to the head of the trunk could potentially involve updating other components as well.
 
Top