Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

pnetcdf causes compiling problems

inos@bas_ac_uk

Ingrid Cnossen
Member
Hi all,I have been porting CESM 1.0.5 to our local linux cluster, scihub, at the British Antarctic Survey, using the intel compilers and openmpi. The initial steps are ok now: the full model compiles, runs, and has passed all the functional tests recommended in the user guide. However, I noticed that a lot of time seems to be spent on I/O, so I though it might be worth using pnetcdf rather than standard netcdf to try and speed things up a bit. I followed all the steps in the userguide on how to enable pnetcdf, but I keep hitting the following error in the cam compilation (copied from the build log):mpif90 -c -I. -I/cm/shared/apps/netcdf/intel/64/4.1.1/include -I/cm/shared/apps/netcdf/intel/64/4.1.1/include -I/cm/shared/apps/openmpi/intel/64/1.6.4/include -I/data/scihub-users/inos/src/parallel-netcdf-1.3.1/include -I. -I/data/scihub-users/inos/src/CESM/cesm1_0_5/port_tests/testFW3/SourceMods/src.cam -I/data/scihub-users/inos/src/CESM/cesm1_0_5/models/atm/cam/src/chemistry/pp_waccm_mozart -I/data/scihub-users/inos/src/CESM/cesm1_0_5/models/atm/cam/src/chemistry/mozart -I/data/scihub-users/inos/src/CESM/cesm1_0_5/models/atm/cam/src/physics/waccm -I/data/scihub-users/inos/src/CESM/cesm1_0_5/models/atm/cam/src/chemistry/bulk_aero -I/data/scihub-users/inos/src/CESM/cesm1_0_5/models/atm/cam/src/chemistry/utils -I/data/scihub-users/inos/src/CESM/cesm1_0_5/models/atm/cam/src/physics/cam -I/data/scihub-users/inos/src/CESM/cesm1_0_5/models/atm/cam/src/dynamics/fv -I/data/scihub-users/inos/src/CESM/cesm1_0_5/models/atm/cam/src/cpl_mct -I/data/scihub-users/inos/src/CESM/cesm1_0_5/models/atm/cam/src/cpl_share -I/data/scihub-users/inos/src/CESM/cesm1_0_5/models/atm/cam/src/control -I/data/scihub-users/inos/src/CESM/cesm1_0_5/models/atm/cam/src/utils -I/data/scihub-users/inos/src/CESM/cesm1_0_5/models/atm/cam/src/utils/pilgrim -I/data/scihub-users/inos/modeldata/CESM/cesm1_0_5/testFW3/lib/include -DLINUX -DSEQ_ -DFORTRANUNDERSCORE -DNO_R16 -DNO_SHR_VMATH -g -fp-model precise -convert big_endian -assume byterecl -ftz -traceback -O2 -FR /data/scihub-users/inos/src/CESM/cesm1_0_5/models/atm/cam/src/utils/spmd_utils.F90
/data/scihub-users/inos/src/CESM/cesm1_0_5/models/atm/cam/src/utils/spmd_utils.F90(51): error #6418: This name has already been assigned a data type. [MPI_STATUS_IGNORE]
integer :: mpi_status_ignore ! Needs to be defined in mpi-serial
--------------^
compilation aborted for /data/scihub-users/inos/src/CESM/cesm1_0_5/models/atm/cam/src/utils/spmd_utils.F90 (code 1)
I have seen a few earlier posts by other users that refer to a similar problem. The discussion there seems to point to a conflict over two files called mpif.h which both define mpi_status_ignore: one file in $(CODEROOT)/utils/mct/mpi-serial and another one in the include directory of openmpi.  My openmpi include directory indeed contains a file called mpif.h which defines mpi_status ignore. However, in my  $(CODEROOT)/utils/mct/mpi-serial directory there is no file called mpif.h, and none of the files in that directory contain "mpi_status_ignore". Also, I have USE_MPISERIAL and MPISERIAL_SUPPORT both set to FALSE, so I don't see why it would use the mct/mpi-serial directory at all or why I'm hitting a problem in spmd_utils.F that appears to be related to mpi-serial. I think therefore that my problem is a little different from what was reported by previous users, although I guess it is something to do with mpif.h. But I just can't figure out what it is exactly that I'm doing wrong. Any ideas? I will greatly appreciate your help!Thanks,Ingrid
 

eaton

CSEG and Liaisons
The problem is that the compile command does not contain "-DSPMD" as that setting will remove the declaration that is causing the problem.  CAM's configure will set this CPP macro when USE_MPISERIAL is FALSE.  Since you claim to have done that I'm guessing that somehow the sequence of events was wrong and the cam.cpl7.template script was not run when this setting was in effect.  I can only suggest to start a clean build from scratch. 
 

inos@bas_ac_uk

Ingrid Cnossen
Member
Hi eaton,Thanks for your reply. I ran the *clean_build script and even tried reconfiguring the case, before building again, but the problem persists. "-DSPMD" is not being added to the compilation command for some reason, even though I doublechecked that USE_MPISERIAL is definitely set to FALSE in env_conf.xml. I then tried adding "-DSPMD" to CPPDEFS in my Macros file to force this option being used, but that's clearly not the right way forward either, because that results in a series of other compiling errors. There must be something else I'm doing wrong... I still find it very strange that I only get these problems when I try using pNetCDF, while all is fine when using normal NetCDF. Shouldn't the problem I'm getting be unrelated to the type of NetCDF? Could there be anything wrong with my installation of pNetCDF that could be causing this kind of problem? Or any other ideas?Thanks for your help!Ingrid
 

jedwards

CSEG and Liaisons
Staff member
Please post the errors you are getting after adding "-DSPMD" to CPPDEFS in the Macros file.  I can't see how the pnetcdf has anything to do with this error.
 

inos@bas_ac_uk

Ingrid Cnossen
Member
This is the last section of the atm.bldlog.* file with the error message I got after I added "-DSPMD" to CPPDEFS:mpif90 -c -I. -I/cm/shared/apps/netcdf/intel/64/4.1.1/include -I/cm/shared/apps/netcdf/intel/64/4.1.1/include -I/cm/shared/apps/openmpi/intel/64/1.6.4/include -I/data/scihub-users/inos/src/parallel-netcdf-1.3.1/include -I. -I/data/scihub-users/inos/src/CESM/cesm1_0_5/port_tests/testFW3/SourceMods/src.cam -I/data/scihub-users/inos/src/CESM/cesm1_0_5/models/atm/cam/src/chemistry/pp_waccm_mozart -I/data/scihub-users/inos/src/CESM/cesm1_0_5/models/atm/cam/src/chemistry/mozart -I/data/scihub-users/inos/src/CESM/cesm1_0_5/models/atm/cam/src/physics/waccm -I/data/scihub-users/inos/src/CESM/cesm1_0_5/models/atm/cam/src/chemistry/bulk_aero -I/data/scihub-users/inos/src/CESM/cesm1_0_5/models/atm/cam/src/chemistry/utils -I/data/scihub-users/inos/src/CESM/cesm1_0_5/models/atm/cam/src/physics/cam -I/data/scihub-users/inos/src/CESM/cesm1_0_5/models/atm/cam/src/dynamics/fv -I/data/scihub-users/inos/src/CESM/cesm1_0_5/models/atm/cam/src/cpl_mct -I/data/scihub-users/inos/src/CESM/cesm1_0_5/models/atm/cam/src/cpl_share -I/data/scihub-users/inos/src/CESM/cesm1_0_5/models/atm/cam/src/control -I/data/scihub-users/inos/src/CESM/cesm1_0_5/models/atm/cam/src/utils -I/data/scihub-users/inos/src/CESM/cesm1_0_5/models/atm/cam/src/utils/pilgrim -I/data/scihub-users/inos/modeldata/CESM/cesm1_0_5/testFW3/lib/include -DLINUX -DSEQ_ -DFORTRANUNDERSCORE -DNO_R16 -DNO_SHR_VMATH -DSPMD -g -fp-model precise -convert big_endian -assume byterecl -ftz -traceback -O2 -FR /data/scihub-users/inos/src/CESM/cesm1_0_5/models/atm/cam/src/utils/pilgrim/mod_comm.F90
/data/scihub-users/inos/src/CESM/cesm1_0_5/models/atm/cam/src/utils/pilgrim/mod_comm.F90(234): error #6592: This symbol must be a defined parameter, an enumerator, or an argument of an inquiry function that evaluates to a compile-time constant. [PCNST]
integer, parameter:: max_trac = PCNST ! No. of tracers
--------------------------------------^
/data/scihub-users/inos/src/CESM/cesm1_0_5/models/atm/cam/src/utils/pilgrim/mod_comm.F90(240): error #6592: This symbol must be a defined parameter, an enumerator, or an argument of an inquiry function that evaluates to a compile-time constant. [PLON]
integer, parameter:: idimsize = PLON*nghost*(PLEV+1)*max_nq
--------------------------------------^
/data/scihub-users/inos/src/CESM/cesm1_0_5/models/atm/cam/src/utils/pilgrim/mod_comm.F90(240): error #6592: This symbol must be a defined parameter, an enumerator, or an argument of an inquiry function that evaluates to a compile-time constant. [PLEV]
integer, parameter:: idimsize = PLON*nghost*(PLEV+1)*max_nq
---------------------------------------------------^
/data/scihub-users/inos/src/CESM/cesm1_0_5/models/atm/cam/src/utils/pilgrim/mod_comm.F90(244): error #6592: This symbol must be a defined parameter, an enumerator, or an argument of an inquiry function that evaluates to a compile-time constant. [PLAT]
integer, parameter:: platg = PLAT + 2*nghost
-----------------------------------^
/data/scihub-users/inos/src/CESM/cesm1_0_5/models/atm/cam/src/utils/pilgrim/mod_comm.F90(234): error #6404: This name does not have a type, and must have an explicit type. [PCNST]
integer, parameter:: max_trac = PCNST ! No. of tracers
--------------------------------------^
/data/scihub-users/inos/src/CESM/cesm1_0_5/models/atm/cam/src/utils/pilgrim/mod_comm.F90(240): error #6404: This name does not have a type, and must have an explicit type. [PLON]
integer, parameter:: idimsize = PLON*nghost*(PLEV+1)*max_nq
--------------------------------------^
/data/scihub-users/inos/src/CESM/cesm1_0_5/models/atm/cam/src/utils/pilgrim/mod_comm.F90(240): error #6404: This name does not have a type, and must have an explicit type. [PLEV]
integer, parameter:: idimsize = PLON*nghost*(PLEV+1)*max_nq
---------------------------------------------------^
/data/scihub-users/inos/src/CESM/cesm1_0_5/models/atm/cam/src/utils/pilgrim/mod_comm.F90(244): error #6404: This name does not have a type, and must have an explicit type. [PLAT]
integer, parameter:: platg = PLAT + 2*nghost
-----------------------------------^
/data/scihub-users/inos/src/CESM/cesm1_0_5/models/atm/cam/src/utils/pilgrim/mod_comm.F90(606): error #6363: The intrinsic data types of the arguments must be the same. [MAX]
if (modcam_gatscat .eq. 0) sizer8 = max( sizer8, PLON*PLAT*max_irr )
-------------------------------------------------------------------^
/data/scihub-users/inos/src/CESM/cesm1_0_5/models/atm/cam/src/utils/pilgrim/mod_comm.F90(3048): warning #6843: A dummy argument with an explicit INTENT(OUT) declaration is not given an explicit value. [Q1OUT]
subroutine mp_sendirr_r4 ( comm, send_bl, recv_bl, q1in, q1out, q2in, q2out, &
---------------------------------------------------------------^
/data/scihub-users/inos/src/CESM/cesm1_0_5/models/atm/cam/src/utils/pilgrim/mod_comm.F90(3048): warning #6843: A dummy argument with an explicit INTENT(OUT) declaration is not given an explicit value. [Q2OUT]
subroutine mp_sendirr_r4 ( comm, send_bl, recv_bl, q1in, q1out, q2in, q2out, &
----------------------------------------------------------------------------^
compilation aborted for /data/scihub-users/inos/src/CESM/cesm1_0_5/models/atm/cam/src/utils/pilgrim/mod_comm.F90 (code 1)
gmake: *** [mod_comm.o] Error 1
 

eaton

CSEG and Liaisons
The compilation command is missing all the CPP defs normally set by CAM's configure, except for -DSPMD which you've added manually.  CAM's configure writes these CPP definitions to the file CCSM_cppdefs.  After the build the file is in the Buildconf/camconf/ subdirectory of the case directory.  When I build cesm1_0_5 for an F compset at f19_f19 resolution this file contains  -DCO2A -DMAXPATCH_PFT=numpft+1 -DLSMLAT=1 -DLSMLON=1 -DPLON=144 -DPLAT=96 -DPLEV=26 -DPCNST=3 -DPCOLS=16 -DPTRM=1 -DPTRN=1 -DPTRK=1 -DSTAGGERED  -DSPMD

This includes all the macros that the compiler error messages you got are complaining about.  I don't know the CESM scripts well enough to know what is wrong.  Maybe Jim has an idea. 
 

inos@bas_ac_uk

Ingrid Cnossen
Member
I have checked the file you mentioned, and for me it contains the same settings, but the value of DPCNST is different, and I have two extra options added:-DCO2A -DMAXPATCH_PFT=numpft+1 -DLSMLAT=1 -DLSMLON=1 -DPLON=144 -DPLAT=96 -DPLEV=66 -DPCNST=65 -DPCOLS=16 -DPTRM=1 -DPTRN=1 -DPTRK=1 -DSTAGGERED -DSPMD -DWACCM_MOZART -DWACCM_PHYSI'm not sure what this means, but it seems like none of the settings are actually applied. Does that give any additional clues?
 

jedwards

CSEG and Liaisons
Staff member
What does the CPPDEFS variable look like in your Macros file?  It should beCPPDEFS += {whatever} so that the previously defined values are carried forward.    Perhaps try commenting out any reference to CPPDEFS in this file to see if the other settings are then used.     
 

inos@bas_ac_uk

Ingrid Cnossen
Member
Yes, that's what it looks like in my Macros file:CPPDEFS += -DLINUX -DSEQ_$(FRAMEWORK) -DFORTRANUNDERSCORE -DNO_R16 -DNO_SHR_VMATH

ifeq ($(compile_threaded), true)
     CPPDEFS += -DTHREADED_OMP
endif

I tried commenting out the CPPDEFS definitions in my Macros file as you suggested, but this leads to errors in building csm_share, so it doesn't even get to the CAM build stage. They look like similar sort of errors I got before from missing out compiling options that are needed. The CAM compiling options are still not there in the compile command, but maybe they wouldn't be anyway when not building CAM. 
 

jedwards

CSEG and Liaisons
Staff member
I'm not able to explain what is going on - I wonder if you should start from scratch and go through all the steps again to see if we can figure out where things went wrong?   The CPPDEFS fromcam weren't there in your first post, so whatever went wrong happened before that step.
 

eaton

CSEG and Liaisons
The file models/atm/cam/bld/cam.cpl7.template is the interface between CAM's configure and the CCSM build scripts.  That file contains the following:set camdefs = "`cat $CASEBUILD/camconf/CCSM_cppdefs`"
gmake complib -j $GMAKE_J MODEL=cam COMPLIB=$LIBROOT/libatm.a MACFILE=$CASEROOT/Macros.$MACH USER_CPPDEFS="$camdefs" -f $CASETOOLS/Makefile || exit 2

This is how the CCSM Makefile knows about the CPP macros set by CAM.  This is where to check to see what's going wrong.
 

inos@bas_ac_uk

Ingrid Cnossen
Member
I checked cam.cpl7.template and it contained the lines you said it should have. I have no idea why the CAM compiling options didn't get copied across correctly to the final compile command, but it looks like I've somehow fixed the problem now. It's good news, though I still don't understand what went wrong in the first place, which is slightly unsatisfactory. Here is what I did: I took out the PNETCDF-related settings one by one to see if at any point it would start getting past the errors, and it didn't. Even with everything set back to normal NetCDF I got the same errors, but reconfiguring fixed that. Then I started adding PNETCDF settings back in, and with all of them there again it now works. Again, I don't know what I've done differently, but I guess I'll just leave it alone now. I still have to test whether the model also runs ok, but at least it's compiling.Thanks for your help,Ingrid
 
Top