Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Viability of running CESM on 40 cores

jedwards

CSEG and Liaisons
Staff member
pnetcdf performance is much better than that of netcdf/hdf5, use of pnetcdf is recommended.
netcdf serial is only needed to build cprnc and shouldn't be used by the model case.
F2000climo does include ice and river models as well as atmosphere and land.
 

jedwards

CSEG and Liaisons
Staff member
Documentation refers to the minimum acceptable version, always the most recent version available.
 
In seeking to test with a newly built pnetcdf and associated netcdf, I am getting a compilation error, and hoping its just that I need to tell CESM to use Pnetcdf not netcdf4.

My environment includes:
$NETCDF_PREFIX, $NETCDF_C_PREFIX, $NETCDF_F_PREFIX and $PNETCDF_PREFIX all set to the same directory:
/exports/applications/apps/community/geos/intel2017u4/pnetcdf/1.12.1

$ ls /exports/applications/apps/community/geos/intel2017u4/pnetcdf/1.12.1/lib
libnetcdf.a libnetcdff.a libnetcdff.la libnetcdff.settings libnetcdf.la libnetcdf.settings libpnetcdf.a libpnetcdf.la pkgconfig

$ ls /exports/applications/apps/community/geos/intel2017u4/pnetcdf/1.12.1/include
netcdf_aux.h netcdf_fortv2_c_interfaces.mod netcdf_meta.h netcdf_nf_data.mod pnetcdf.h
netcdf_dispatch.h netcdf.h netcdf.mod netcdf_nf_interfaces.mod pnetcdf.inc
netcdf_f03.mod netcdf.inc netcdf_nc_data.mod netcdf_par.h pnetcdf.mod
netcdf_filter.h netcdf_mem.h netcdf_nc_interfaces.mod pnetcdf


When I tested with J_TestCreateNewcase:

OUTPUT: Building case in directory /exports/eddie/scratch/mjm/P8_TCN/TestCreateNewcase/testcreatenewcase_with_user_compset
sharedlib_only is False
...

Calling /exports/csce/eddie/geos/groups/cesd/CESM/my_cesm_sandbox/cime/src/build_scripts/buildlib.pio
ERRPUT: ERROR: /exports/csce/eddie/geos/groups/cesd/CESM/my_cesm_sandbox/cime/src/build_scripts/buildlib.pio FAILED, cat /exports/eddie/scratch/mjm/P8_TCN/TestCreateNewcase/testcreatenewcase_with_user_compset/bld/pio.bldlog.200727-170319

And the bldlog includes:
-- The C compiler identification is Intel 17.0.0.20170411
-- The Fortran compiler identification is Intel
...
-- Found NetCDF_C: /exports/applications/apps/community/geos/intel2017u4/pnetcdf/1.12.1/lib/libnetcdf.a
-- Checking NetCDF version
-- Checking NetCDF version - 4.7.4./*!<
-- Checking whether NetCDF has parallel support
-- Checking whether NetCDF has parallel support - yes
-- Looking for nc_set_log_level
-- Looking for nc_set_log_level - not found
-- Checking whether NetCDF has PnetCDF support
-- Checking whether NetCDF has PnetCDF support - yes
-- Found PnetCDF_C: /exports/applications/apps/community/geos/intel2017u4/pnetcdf/1.12.1/lib/libpnetcdf.a
-- Checking PnetCDF version
-- Checking PnetCDF version - 1.12.1
-- Checking whether NetCDF has DAP support
-- Checking whether NetCDF has DAP support - no
-- Found HDF5_HL: /usr/lib64/libhdf5_hl.so
-- Found HDF5_C: /usr/lib64/libhdf5.so
-- Found NetCDF_Fortran: /exports/applications/apps/community/geos/intel2017u4/pnetcdf/1.12.1/lib/libnetcdff.a
-- Found PnetCDF_Fortran: /exports/applications/apps/community/geos/intel2017u4/pnetcdf/1.12.1/lib/libpnetcdf.a
-- PIO using gpfs filesystem hints
-- MPIIO detected and enabled.
-- MPI Fortran module detected and enabled.
...
/exports/csce/eddie/geos/groups/cesd/CESM/my_cesm_sandbox/cime/src/externals/pio1/pio/nf_mod.F90(1761): error #6404: This name does not have a type, and must have an explicit type. [NF90_DEF_VAR_DEFLATE]
ierr = nf90_def_var_deflate(File%fh,vardesc%varid,0,1,1)
--------------------------^
compilation aborted for /exports/csce/eddie/geos/groups/cesd/CESM/my_cesm_sandbox/cime/src/externals/pio1/pio/nf_mod.F90 (code 1)

--
The call to nf90.... is in a branch after ifdefs for NETCFD4 which I'd not expect to be compiling.
Do I need FLAGS including -D _PNETCDF ?

I attach the xml files from ~cime, and xml from the test case, and the full bldlog file, and hoping you dont need them, the config files from the pnetcdf and netcdf libraries.

Thank you,.
 

Attachments

  • mjm.tar.gz
    81.5 KB · Views: 1

jedwards

CSEG and Liaisons
Staff member
-- Found NetCDF_C: /exports/applications/apps/community/geos/intel2017u4/pnetcdf/1.12.1/lib/libnetcdf.a
-- Checking NetCDF version
-- Checking NetCDF version - 4.7.4./*!<
-- Checking whether NetCDF has parallel support
-- Checking whether NetCDF has parallel support - yes

It thinks that you have netcdf4 parallel.
 

jedwards

CSEG and Liaisons
Staff member
Maybe telling you to use the latest was wrong - can you try netcdf-4.7.3?
What is the output of nc-config --all?
 
./nc-config --all

This netCDF 4.7.4 has been built with the following features:

--cc -> mpiicc
--cflags -> -I/exports/applications/apps/community/geos/intel2017u4/pnetcdf/1.12.1/include
--libs -> -L/exports/applications/apps/community/geos/intel2017u4/pnetcdf/1.12.1/lib -L/exports/applications/apps/community/geos/intel2017u4/pnetcdf/1.12.1/lib -L/exports/applications/apps/SL7/intel/parallel_studio_xe_2017_update4/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin/ -lnetcdf -lpnetcdf -lm -lz -lcurl -lpnetcdf -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lirc
--static -> -lpnetcdf -lm -lz -lcurl -lpnetcdf -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lirc

--has-c++ -> no
--cxx ->

--has-c++4 -> no
--cxx4 ->

--has-fortran -> yes
--fc -> mpiifort
--fflags -> -I/exports/applications/apps/community/geos/intel2017u4/pnetcdf/1.12.1/include -I/exports/applications/apps/community/geos/intel2017u4/pnetcdf/1.12.1/include
--flibs -> -L/exports/applications/apps/community/geos/intel2017u4/pnetcdf/1.12.1/lib -lnetcdff
--has-f90 ->
--has-f03 -> yes

--has-dap -> yes
--has-dap2 -> yes
--has-dap4 -> no
--has-nc2 -> yes
--has-nc4 -> no
--has-hdf5 -> no
--has-hdf4 -> no
--has-logging -> no
--has-pnetcdf -> yes
--has-szlib -> no
--has-cdf5 -> yes
--has-parallel4 -> no
--has-parallel -> yes

--prefix -> /exports/applications/apps/community/geos/intel2017u4/pnetcdf/1.12.1
--includedir -> /exports/applications/apps/community/geos/intel2017u4/pnetcdf/1.12.1/include
--libdir -> /exports/applications/apps/community/geos/intel2017u4/pnetcdf/1.12.1/lib
--version -> netCDF 4.7.4

I had run
CFLAGS="-fPIC" CC=mpiicc LDFLAGS="-L/exports/applications/apps/community/geos/intel2017u4/pnetcdf/1.12.1/lib -L/exports/applications/apps/SL7/intel/parallel_studio_xe_2017_update4/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin/" LIBS="-lpnetcdf -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lirc" ./configure --disable-shared --disable-netcdf-4 --enable-pnetcdf --prefix=/exports/applications/apps/community/geos/intel2017u4/pnetcdf/1.12.1


Thanks for looking at this.
 

jedwards

CSEG and Liaisons
Staff member
I think that I finally see the issue - you are building pnetcdf within the netcdf library - that's a fairly new feature that we
haven't yet tried. You should build and install netcdf and pnetcdf separately and link both.
 
Thanks.
May I check please:
  • the pnetcdf build is probably ok, but
  • when I re-do the netcdf-C library configure I drop the --enable-pnetcdf but do use --disable-shared --disable-netcdf-4 ?
  • Does that make the netcdf-C library just a sequential build, but without netcdf4? So should I still compile this with mpiicc (not icc) ? Does it matter?!
  • Then the Fortran build will need to be repeated but with the same flags
Finally, please: do you usually recommend static or shared libraries be used?
 

sacks

Bill Sacks
CSEG and Liaisons
Staff member
It looks like @jedwards has replied to many of your questions (thanks, Jim!), but I wanted to just add a bit, in reply to an earlier post:

how did you tell what actually was used?

I checked the PIO_TYPENAME values in env_run.xml in the case directory files you attached earlier. If I remember correctly, this is determined automatically based on whether PIO is built with pnetcdf support. (PIO is built as part of your CESM build, and I believe it tries to detect whether pnetcdf is available on your system, based on settings in your machine xml files and/or your environment.)

Does F200climo include a simplified ice and the river models? - I was surprised to see more than cam files output.

Just to elaborate a bit on @jedwards' reply: the ice (CISM) and river (MOSART) models are mainly just used for diagnostic purposes in an F compset. (That's not 100% the case, but is close to the truth.)
 
Jim and Bill - Thanks both!!

I do now have a unvalidated but so-far-working model and apparently with pnetcdf (env_run.xml now has just two valid values, netcdf and pnetcdf and the latter is used.) You may recall we embarked on my second journey round netcdf versions when a 20 core job hung with a netcdf library that used parallel hdf5. It closes nicely now with the new pnetcdf 1.12.1, built separate from the netcdf (c, version 4.7.3; f: 4.5.2) I had built the netcdf libraries with mpiicc and mpiifort but their tests worked ok.

next steps:

I'll try the cheyenne pre-alpha tests and then ensemble tests - the second case( UF-CAM-ECT - detects issues in CAM and CLM (9 time step runs) seems most relevant and feasible for us.

@hannay Cecile I am hoping to be close to the time when I could make the comparison you proposed early in this thread
If I start my run going, for how long should I aim to run for meaningful comparison?

May I ask two simple questions (with apologies if these are documented)
  • can I change number of pes and memory without rebuilding?
    • I;d like to run a few cases of short F2000 runs to see what we get on our 20, 32, 40 core nodes
  • how to add top of atmosphere radiation to outputs.
 

QINKONG

QINQIN KONG
Member
I think that I finally see the issue - you are building pnetcdf within the netcdf library - that's a fairly new feature that we
haven't yet tried. You should build and install netcdf and pnetcdf separately and link both.
Hi Jim. I feel confused about whether I'm using netcdf or pnetcdf. Below is the relevant part of my config_machine and config_compiler file:

<modules compiler="intel">
<command name="load">intel/19.0.3.199</command>
<command name="load">openmpi/3.1.4</command>
<command name="load">netcdf/4.7.0</command>
<command name="load">netcdf-fortran/4.5.2</command>
<command name="load">parallel-netcdf/1.10.0</command>
<command name="load">hdf5/1.10.5</command>
<command name="load">netlib-lapack/3.6.0</command>
<command name="load">openblas/0.3.7</command>
<command name="load">cmake/3.15.4</command>
</modules>


<compiler MACH="brown" COMPILER="intel">
<NETCDF_C_PATH>$ENV{NETCDF}</NETCDF_C_PATH>
<NETCDF_FORTRAN_PATH>$ENV{NETCDF_FORTRAN_HOME}</NETCDF_FORTRAN_PATH>
<PNETCDF_PATH>$ENV{PARALLEL_NETCDF_HOME}</PNETCDF_PATH>
<HDF5_PATH>$ENV{HDF5_HOME}</HDF5_PATH>
<PIO_FILESYSTEM_HINTS>lustre</PIO_FILESYSTEM_HINTS>
<SLIBS>
<append> -lnetcdff -lnetcdf</append>
</SLIBS>
</compiler>

I also checked the PIO_TYPENAME in the env_run file of one case, as below:
<entry id="PIO_TYPENAME">

<type>char</type>
<valid_values>netcdf,pnetcdf</valid_values>
<desc>pio io type</desc>
<values>
<value compclass="ATM">pnetcdf</value>
<value compclass="CPL">pnetcdf</value>
<value compclass="OCN">pnetcdf</value>
<value compclass="WAV">pnetcdf</value>
<value compclass="GLC">pnetcdf</value>
<value compclass="ICE">pnetcdf</value>
<value compclass="ROF">pnetcdf</value>
<value compclass="LND">pnetcdf</value>
<value compclass="ESP">pnetcdf</value>
</values>
</entry>

  • Does the PIO_TYPENAME indicate that I'm using pnetcdf rather than netcdf?
  • Do I need add pnetcdf in the SLIBS of compiler config file?
  • Should I delete netcdf and netcdf-fortran in the config_machines and config_compiler files? Or pnetcdf depends on netcdf and/or netcdf-fortran?
Thanks!
 
Top