Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

CAM 4.0 gmake failure

jshaman

New Member
I am trying to run CAM 4.0 (as distributed with CCSM) on a linux cluster. The requirements for building and running are all met: Perl 5.4 or later, gmake, Fortran 90 and C compilers, a NetCDF library (3.6 or later). The code appears to configure fine (happy with the gmake, NetCDF library and MPI library), but it fails on building the model.

Basically the command "gmake >& MAKE.out" produces a suite of errors like:

fortcom: Error: /home/jshaman/ccsm4_0_a02/models/drv/driver/ccsm_driver.F90, line 46: Name in only-list does not exist. [OCN_FINAL_MCT]
use ocn_comp_mct, only: ocn_init_mct, ocn_run_mct, ocn_final_mct
------------------------------------------------------^

And then aborts with:

compilation aborted for /home/jshaman/ccsm4_0_a02/models/drv/driver/ccsm_driver.F90 (code 1)
gmake: *** [ccsm_driver.o] Error 1

I see that 'fortcom' errors typically indicate that the source files are being compiled in the wrong order and that utilities like 'makemake.perl' or 'mkmr.pl' can resolve this, but I'm really not sure how to implement this fix.

Do you have any suggestions, or do you know of anyone who has set up the new CAM in a similar environment?

Thanks,
Jeff
 

eaton

CSEG and Liaisons
Another thread in this forum reported the same problem; also with intel. Could you please post the exact configure command that you're using?

The Makefile executes the utility mkDepends which is responsible for setting up all the dependencies required to compile things in the right order. So you shouldn't need to modify that. If you look in the directory where the build occurs you should find a file called Depends. Look in that file for the dependencies of ccsm_driver.o. You should find that ocn_comp_mct.o is a dependency, and that should force the compilation of ocn_comp_mct.F90 before ccsm_driver.F90.
 

jshaman

New Member
Thanks for responding, and all the posts to RAM. I followed the latter thread and my problem was similar--I was using an alpha release. I have downloaded CESM 1.0, but am now running into another issue, again at the build stage. I get lots warnings such as:

fortcom: Warning: /home/jshaman/cesm1_0/models/utils/esmf_wrf_timemgr/ESMF_BaseMod.F90, line 802: A dummy argument with a
n explicit INTENT(OUT) declaration is not given an explicit value. [TYPELIST]
subroutine ESMF_AttributeGetObjectList(anytypelist, name, typelist, valuelist, rc)
----------------------------------------------------------------^
fortcom: Warning: /home/jshaman/cesm1_0/models/utils/esmf_wrf_timemgr/ESMF_BaseMod.F90, line 750: A dummy argument with a
n explicit INTENT(OUT) declaration is not given an explicit value. [TYPELIST]
subroutine ESMF_AttributeGetList(anytype, namelist, typelist, valuelist, rc)
----------------------------------------------------------^

Followed by the error

mpif90 -c /home/jshaman/cesm1_0/models/utils/esmf_wrf_timemgr/ESMF_BaseTimeMod.F90
ESMF_BaseTimeMod.F90(24): #error: can't find include file: ESMF_TimeMgr.inc
gmake: *** [ESMF_BaseTimeMod.o] Error 1


The file ESMF_TimeMgr.inc is there (in /home/jshaman/cesm1_0/models/utils/esmf_wrf_timemgr), so I am not sure what the problem is.

I ran the configure verbosely and I've attached the configure ouput and Filepath files in case this is of help. If you have any insights into this issue it would be very helpful.

Thanks,
Jeff
 

eaton

CSEG and Liaisons
I'm guessing the problem is that you have specified "-fc mpif90" to configure. The reason that doesn't work (this is something I plan to fix) is that mpif90 is just a script that invokes a particular compiler (in you case I assume it's invoking the ifort compiler). But the Makefile doesn't know what type of fortran compiler is being wrapped by mpif90 and consequently it is not able to supply the correct compiler arguments. I suggest executing configure with the arguments "-fc ifort -linker mpif90".
 

jshaman

New Member
Thanks for the suggestion. That did resolve the issue, but another error is occurring so that the code still doesn't build. It seems to be a memory issue. The code compiles for a considerable time then aborts. The error is as follows:

ifort: error #10106: Fatal error in /opt/intel/fce/10.1.015/bin/fortcom, terminated by
kill signal
compilation aborted for /home/jshaman/cesm1_0/models/lnd/clm/src/main/initGridCellsMod
.F90 (code 1)
gmake: *** [initGridCellsMod.o] Error 1
gmake: *** Waiting for unfinished jobs....


I'm thinking that I need to update to ifort 11.x. Do you think this is the issue? Or do you suspect something else at the root of this error?

Thanks,
Jeff
 

eaton

CSEG and Liaisons
I've been able to build CAM using intel v11. I'm not sure about v10. The error you report looks like a system problem though, so a memory issue is possible.

One thing we have noticed about the intel compiler that it does leave rather large temp files behind when a compilation aborts. I think these are in /tmp by default, so you might check there to make sure that the compiler has room to write there.
 

jshaman

New Member
I've set up on a different linux cluster with Intel v11 compiler (mpif90 linker), and CESM1.0. CAM 3.1 works on the cluster, but I can't get CAM 5.0 to build.

gmake >& MAKE.out

takes over an hour before producing an error.

It also slows down in terms of the number of .mod and .o files it produces before giving up. Where it fails seems to vary, but it ultimately gives up with something like:


home04/jshaman/cesm1_0/models/atm/cam/src/control/scamMod.F90
Putting child 0x1990a280 (scamMod.o) PID 26718 on the chain.
Live child 0x1990a280 (scamMod.o) PID 26718
/n/home04/jshaman/cesm1_0/models/atm/cam/src/control/scamMod.F90(14): error #7002: Error in opening the compiled module file. C
heck INCLUDE paths. [CONSTITUENTS]
use constituents, only: pcnst
------^
/n/home04/jshaman/cesm1_0/models/atm/cam/src/control/scamMod.F90(80): error #6406: Conflicting attributes or multiple declaratio
n of name. [PCNST]
real(r8), public :: alphacam(pcnst)
--------------------------------^


etc., before ultimately giving out with:



compilation aborted for /n/home04/jshaman/cesm1_0/models/atm/cam/src/control/scamMod.F90 (code 1)
Reaping losing child 0x1990a280 PID 26718
gmake: *** [scamMod.o] Error 1
Removing child 0x1990a280 PID 26718 from chain.



Do you think this is a memory issue?

Thanks,
Jeff
 

eaton

CSEG and Liaisons
I have gotten similar reports about trouble building cam5 w/ intel v11, and the problem was appearing trying to compile shr_scam_mod.F90 which is a dependency of scamMod.F90. I can successfully build on an 8 core node of Xeon X5570 @ 2.9GHz. This node has 3-GB of memory per core for a total of 24-GB. It takes about 35 minutes for the build allowing gmake to use 16 threads (i.e., gmake -j 16). The build done with pgi using the same compute resource only takes about 3 minutes. So there is definitely something causing the intel compiler problems, and I think it is very possibly a resource issue that you're running into. But I don't know the solution. I will continue to look into this problem.

Note that while the cam4 build is a bit faster, it still takes about 21 minutes using the same compute resource as for the cam5 build.
 

jshaman

New Member
Thanks for the info and your help with this. I will check back with the bulletin board to see if this intel compiler issue is resolved in the future. In the meantime, I'll see how much memory I can commandeer currently, and if I can get it to build.

Cheers,
Jeff
 
Top