Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Abort with message NetCDF: NC_UNLIMITED size already in use in file

Yuan Sun

Yuan Sun
Member
Hi all, I am porting CESM2.3alpha017 in Archer2.

After I ran ./case.submit, it returned an error:

Opening file test.cism.gris.initial_hist.2085-01-01-00000.nc for output;
Write output at start of run and every 1.0000000000000000 years
Creating variables internal_time, time, and tstep_count
Creating variable level
Creating variable lithoz
Creating variable nlev_smb
Creating variable staglevel
Creating variable stagwbndlevel
Creating variable x0
Creating variable x1
Creating variable y0
Creating variable y1
Creating variable zocn
Creating variable artm
Creating variable smb
Creating variable thk
Creating variable topg
Creating variable usurf
Writing to file test.cism.gris.initial_hist.2085-01-01-00000.nc at time 0.0000000000000000
WHL: oc_tavg_helper is not associated; associate now
Opening file test.cism.gris.tavg_helper.0000-00-00-00000.nc for output;
Write output every 9999999.0000000000 years
Creating variables internal_time, time, and tstep_count
Creating variable level
Creating variable lithoz
Creating variable nlev_smb
Creating variable staglevel
Creating variable stagwbndlevel
Creating variable x0
Creating variable x1
Creating variable y0
Creating variable y1
Creating variable zocn
Creating variable artm
Creating variable smb
Creating variable thk
Creating variable topg
Creating variable usurf
Abort with message NetCDF: NC_UNLIMITED size already in use in file /work/n02/n02/yuansun/cesm/my_cesm_sandbox_2.3beta/libraries/parallelio/src/clib/pio_nc.c at line 2107
Abort with message NetCDF: NC_UNLIMITED size already in use in file /work/n02/n02/yuansun/cesm/my_cesm_sandbox_2.3beta/libraries/parallelio/src/clib/pio_nc.c at line 2107


It seems that several MPI process/task are opening the file for writing initially. I checked the setting is:

yuansun@ln04:/work/n02/n02/yuansun/cesm/runs/test> ./xmlquery NTHRDS
NTHRDS: ['CPL:1', 'ATM:1', 'LND:1', 'ICE:1', 'OCN:1', 'ROF:1', 'GLC:1', 'WAV:1', 'ESP:1']
yuansun@ln04:/work/n02/n02/yuansun/cesm/runs/test> ./xmlquery NTASKS
NTASKS: ['CPL:128', 'ATM:128', 'LND:128', 'ICE:128', 'OCN:128', 'ROF:128', 'GLC:128', 'WAV:128', 'ESP:128']

I am not sure how to modify it. Since I use parallel computing, whether I should set NTHRDS>1?

The whole log is attached.

Best,
Yuan
 

Attachments

  • cesm.log.5314109.240124-030352.txt
    151 KB · Views: 2

jedwards

CSEG and Liaisons
Staff member
Could you please send me the details of the parallelio build? In the case directory run
./xmlquery -p PIO
and from the bld directory the pio.bldlog file.
 

Yuan Sun

Yuan Sun
Member
Dear Jedwards,

Thanks for taking an insight.

After I ran ./xmlquery -p PIO
The terminal returns below:

yuansun@ln04:/work/n02/n02/yuansun/cesm/runs/ttest> ./xmlquery -p PIO

Results in group build_macros
PIO_CONFIG_OPTS:
PIO_VERSION: 2

Results in group case_last
PIO_SPEC_FILE: /mnt/lustre/a2fs-work2/work/n02/n02/yuansun/cesm/my_cesm_sandbox_2.3beta/ccs_config/machines/config_pio.xml

Results in group run_pio
PIO_ASYNCIO_NTASKS: 0
PIO_ASYNCIO_ROOTPE: 1
PIO_ASYNCIO_STRIDE: 0
PIO_ASYNC_INTERFACE: ['CPL:FALSE', 'ATM:FALSE', 'LND:FALSE', 'ICE:FALSE', 'OCN:FALSE', 'ROF:FALSE', 'GLC:FALSE', 'WAV:FALSE', 'ESP:FALSE']
PIO_BLOCKSIZE: -1
PIO_BUFFER_SIZE_LIMIT: -1
PIO_DEBUG_LEVEL: 0
PIO_NETCDF_FORMAT: ['CPL:64bit_offset', 'ATM:64bit_offset', 'LND:64bit_offset', 'ICE:64bit_offset', 'OCN:64bit_offset', 'ROF:64bit_offset', 'GLC:64bit_offset', 'WAV:64bit_offset', 'ESP:64bit_offset']
PIO_NUMTASKS: ['CPL:-99', 'ATM:-99', 'LND:-99', 'ICE:-99', 'OCN:-99', 'ROF:-99', 'GLC:-99', 'WAV:-99', 'ESP:-99']
PIO_REARRANGER: ['CPL:2', 'ATM:2', 'LND:2', 'ICE:2', 'OCN:2', 'ROF:2', 'GLC:2', 'WAV:2', 'ESP:2']
PIO_REARR_COMM_ENABLE_HS_COMP2IO: TRUE
PIO_REARR_COMM_ENABLE_HS_IO2COMP: FALSE
PIO_REARR_COMM_ENABLE_ISEND_COMP2IO: FALSE
PIO_REARR_COMM_ENABLE_ISEND_IO2COMP: TRUE
PIO_REARR_COMM_FCD: 2denable
PIO_REARR_COMM_MAX_PEND_REQ_COMP2IO: -2
PIO_REARR_COMM_MAX_PEND_REQ_IO2COMP: 64
PIO_REARR_COMM_TYPE: p2p
PIO_ROOT: ['CPL:1', 'ATM:1', 'LND:1', 'ICE:1', 'OCN:1', 'ROF:1', 'GLC:1', 'WAV:1', 'ESP:1']
PIO_STRIDE: ['CPL:128', 'ATM:128', 'LND:128', 'ICE:128', 'OCN:128', 'ROF:128', 'GLC:128', 'WAV:128', 'ESP:128']
PIO_TYPENAME: ['CPL:netcdf', 'ATM:netcdf', 'LND:netcdf', 'ICE:netcdf', 'OCN:netcdf', 'ROF:netcdf', 'GLC:netcdf', 'WAV:netcdf', 'ESP:netcdf']

In the case/run directory, I did not found cpl.log file. The simulation quickly absorbed in the initalization stage.

Thanks again.

Best,
Yuan
 

Attachments

  • pio.bldlog.240124-151157.txt
    90.9 KB · Views: 0

Yuan Sun

Yuan Sun
Member
Just for the background. Before this error, I met several errors such as:
- ESMF version should be 8.4.1 or newer
- This functionality requires ESMF to be built with the PIO library enabled.
- Invalid NTASKS value specified for component: cpl ntasks: 128 1

To address these errors, I install pio2.6.1 and ESMF8.5 in the personal directory: /work/n02/n02/yuansun/privatemodules_packages/archer2/apps/gcc/

I am not sure whether these steps influence the current error of pio.

The config_machines.xml is :

<machine MACH="archer2">
<DESC>two CrayAMD EPYC Zen2, 128 pes/node, batch system is SLURM</DESC>
<NODENAME_REGEX>(ln\d{2}$|nid\d{6}$)</NODENAME_REGEX>
<OS>CNL</OS>
<COMPILERS>gnu,cray</COMPILERS>
<MPILIBS>mpich,mpi-serial</MPILIBS>
<CIME_OUTPUT_ROOT>$ENV{CESM_ROOT}/runs</CIME_OUTPUT_ROOT>
<DIN_LOC_ROOT>$ENV{CESM_ROOT}/cesm_inputdata</DIN_LOC_ROOT>
<DIN_LOC_ROOT_CLMFORC>${DIN_LOC_ROOT}/atm/datm7</DIN_LOC_ROOT_CLMFORC>
<DOUT_S_ROOT>$ENV{CESM_ROOT}/archive/$CASE</DOUT_S_ROOT>
<BASELINE_ROOT>$ENV{CESM_ROOT}/ccsm_baselines</BASELINE_ROOT>
<CCSM_CPRNC>$ENV{CIMEROOT}/tools/cprnc/cprnc</CCSM_CPRNC>
<GMAKE_J>8</GMAKE_J>
<BATCH_SYSTEM>slurm</BATCH_SYSTEM>
<SUPPORTED_BY>leeds.ac.uk</SUPPORTED_BY>
<MAX_TASKS_PER_NODE>128</MAX_TASKS_PER_NODE>
<MAX_MPITASKS_PER_NODE>128</MAX_MPITASKS_PER_NODE>
<PROJECT_REQUIRED>TRUE</PROJECT_REQUIRED>
<mpirun mpilib="default">
<executable>srun</executable>
<arguments>
<arg name="cpubind"> --distribution=block:block --hint=nomultithread</arg>
<!--<arg name="cpubind"> -ZZ-cpu-bind=cores</arg> -->
</arguments>
</mpirun>
<module_system type="module" allow_error="true">
<init_path lang="perl">/usr/share/lmod/lmod/init/perl</init_path>
<init_path lang="python">/usr/share/lmod/lmod/init/env_modules_python.py</init_path>
<init_path lang="csh">/usr/share/lmod/lmod/init/csh</init_path>
<init_path lang="sh">/usr/share/lmod/lmod/init/sh</init_path>
<cmd_path lang="perl">/usr/share/lmod/lmod/libexec/lmod perl</cmd_path>
<cmd_path lang="python">/usr/share/lmod/lmod/libexec/lmod python</cmd_path>
<cmd_path lang="sh">module</cmd_path>
<cmd_path lang="csh">module</cmd_path>
<modules compiler="gnu">
<command name="load"> PrgEnv-gnu</command>
<command name="load"> cray-hdf5-parallel</command>
<command name="load"> cray-netcdf-hdf5parallel</command>
<command name="load"> cray-parallel-netcdf</command>
<command name="load"> cray-libsci</command>
</modules>
</module_system>
<environment_variables>
<env name="PERL5LIB">/work/n02/shared/perl/5.26.2</env>
<env name="OMP_NUM_THREADS">{{ thread_count }} </env>
<env name="OMP_PLACES">cores </env>
<env name="OMP_STACKSIZE">2G</env>
<env name="ESMFMKFILE">/mnt/lustre/a2fs-work2/work/n02/n02/yuansun/privatemodules_packages/archer2/apps/gcc/esmf/8.5/lib/esmf.mk</env>
<env name="ESMF_RUNTIME_PROFILE">ON</env>
<env name="ESMF_RUNTIME_PROFILE_OUTPUT">SUMMARY</env>
</environment_variables>
<resource_limits>
<resource name="RLIMIT_STACK">-1</resource>
</resource_limits>
</machine>
 

jedwards

CSEG and Liaisons
Staff member
Please update to pio 2.6.2 and use the same pio build for both esmf and cesm. You can do this by building
ESMF with env variable ESMF_PIO=external
and CESM with
environment variables
PIO_VERSION_MAJOR=2
PIO_LIBDIR,
PIO_INCDIR,
PIO_TYPENAME_VALID_VALUES

Where PIO_TYPENAME_VALID_VALUES=netcdf, pnetcdf, netcdf4p (all that apply)

There is no cpl.log in cesm2.3.x it's called med.log now.
 

Yuan Sun

Yuan Sun
Member
Hi,

Sorry for my inquiry again. I modified the config_machines.xml with

<environment_variables>
<env name="PERL5LIB">/work/n02/shared/perl/5.26.2</env>
<env name="OMP_NUM_THREADS">{{ thread_count }} </env>
<env name="OMP_PLACES">cores </env>
<env name="OMP_STACKSIZE">2G</env>
<env name="ESMFMKFILE">/mnt/lustre/a2fs-work2/work/n02/n02/yuansun/privatemodules_packages/archer2/apps/gcc/esmf/8.5/lib/esmf.mk</env>
<env name="ESMF_RUNTIME_PROFILE">ON</env>
<env name="ESMF_RUNTIME_PROFILE_OUTPUT">SUMMARY</env>
<env name="PIO_VERSION_MAJOR">2</env>
<env name="PIO_LIBDIR">/mnt/lustre/a2fs-work2/work/n02/n02/yuansun/privatemodules_packages/archer2/apps/gcc/pio2/2.6.2/lib</env>
<env name="PIO_INCDIR">/mnt/lustre/a2fs-work2/work/n02/n02/yuansun/privatemodules_packages/archer2/apps/gcc/pio2/2.6.2/include</env>
<env name="PIO_TYPENAME_VALID_VALUES">netcdf, pnetcdf, netcdf4p</env>
</environment_variables>

It returns error: error while loading shared libraries: libpiof.so.4: cannot open shared object file: No such file or directory

I have checked that I installed the pio and esmf through running the command ldd /work/n02/n02/yuansun/cesm/runs/ttest/bld/cesm.exe
it shows:
linux-vdso.so.1 (0x00007ffef99c8000) libpiof.so.4 => not found libpioc.so.5 => not found libesmf.so => /work/n02/n02/yuansun/privatemodules_packages/archer2/apps/gcc/esmf/8.5/lib/libesmf.so (0x00007f550a386000) librt.so.1 => /lib64/librt.so.1 (0x00007f550a17d000) libstdc++.so.6 => /opt/cray/pe/gcc-libs/libstdc++.so.6 (0x00007f5509d6b000) libdl.so.2 => /lib64/libdl.so.2 (0x00007f5509b67000) libnetcdff_parallel_gnu_91.so.7 => /opt/cray/pe/lib64/libnetcdff_parallel_gnu_91.so.7 (0x00007f55098a8000) libnetcdf_parallel_gnu_91.so.19 => /opt/cray/pe/lib64/libnetcdf_parallel_gnu_91.so.19 (0x00007f55094ca000) libsci_gnu_82.so.5 => /opt/cray/pe/lib64/libsci_gnu_82.so.5 (0x00007f5505bf8000) libmpifort_gnu_91.so.12 => /opt/cray/pe/lib64/libmpifort_gnu_91.so.12 (0x00007f5505974000) libmpi_gnu_91.so.12 => /opt/cray/pe/lib64/libmpi_gnu_91.so.12 (0x00007f5502def000) libxpmem.so.0 => /opt/cray/xpmem/default/lib64/libxpmem.so.0 (0x00007f550be12000) libgfortran.so.5 => /opt/cray/pe/gcc-libs/libgfortran.so.5 (0x00007f5502943000) libm.so.6 => /lib64/libm.so.6 (0x00007f55025f8000) libgcc_s.so.1 => /opt/cray/pe/gcc-libs/libgcc_s.so.1 (0x00007f55023df000) libquadmath.so.0 => /opt/cray/pe/gcc-libs/libquadmath.so.0 (0x00007f5502198000) libc.so.6 => /lib64/libc.so.6 (0x00007f5501da3000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f5501b80000) libgomp.so.1 => /opt/cray/pe/gcc-libs/libgomp.so.1 (0x00007f550193d000) /lib64/ld-linux-x86-64.so.2 (0x00007f550bc0c000) libhdf5_hl_parallel_gnu_91.so.200 => /opt/cray/pe/lib64/libhdf5_hl_parallel_gnu_91.so.200 (0x00007f5501719000) libhdf5_parallel_gnu_91.so.200 => /opt/cray/pe/lib64/libhdf5_parallel_gnu_91.so.200 (0x00007f550109c000) libfabric.so.1 => /opt/cray/libfabric/1.12.1.2.2.0.0/lib64/libfabric.so.1 (0x00007f5500dec000) libatomic.so.1 => /opt/cray/pe/gcc-libs/libatomic.so.1 (0x00007f5500be4000) libpmi.so.0 => /opt/cray/pe/lib64/libpmi.so.0 (0x00007f55009e2000) libpmi2.so.0 => /opt/cray/pe/lib64/libpmi2.so.0 (0x00007f55007a9000) libz.so.1 => /lib64/libz.so.1 (0x00007f5500592000) librdmacm.so.1 => /usr/lib64/librdmacm.so.1 (0x00007f5500372000) libibverbs.so.1 => /usr/lib64/libibverbs.so.1 (0x00007f5500152000) libpals.so.0 => /opt/cray/pe/lib64/libpals.so.0 (0x00007f54fff4d000) libnl-3.so.200 => /usr/lib64/libnl-3.so.200 (0x00007f54ffd2b000) libnl-route-3.so.200 => /usr/lib64/libnl-route-3.so.200 (0x00007f54ffab5000)

Best,
Yuan
 

jedwards

CSEG and Liaisons
Staff member
You may need to add -Wl,-rpath,/mnt/lustre/a2fs-work2/work/n02/n02/yuansun/privatemodules_packages/archer2/apps/gcc/pio2/2.6.2/lib
to your link line so that ldd finds the libraries.
 

Yuan Sun

Yuan Sun
Member
Hi Jedwards,

Thanks again for viewing my issues.

Based on your valuable comments, I modified the Makefile in the cime by :

Before:
ifdef PIO_LIBDIR
ifeq ($(PIO_VERSION),$(PIO_VERSION_MAJOR))
INCLDIR += -I$(PIO_INCDIR)
SLIBS += -L$(PIO_LIBDIR)
else
# If PIO_VERSION_MAJOR doesnt match, build from source
unexport PIO_LIBDIR
endif
endif
PIO_LIBDIR ?= $(INSTALL_SHAREDPATH)/lib

Modified:
ifdef PIO_LIBDIR
ifeq ($(PIO_VERSION),$(PIO_VERSION_MAJOR))
INCLDIR += -I$(PIO_INCDIR)
SLIBS += -Wl,-rpath=$(PIO_LIBDIR)
else
# If PIO_VERSION_MAJOR doesnt match, build from source
unexport PIO_LIBDIR
endif
endif
PIO_LIBDIR ?= $(INSTALL_SHAREDPATH)/lib

The CESM2.3 finally works now.

Best,
Yuan
 

Yuan Sun

Yuan Sun
Member
Hi all,

after I link the external/ installed piolib, the cesm.log error appears again below:

Abort with message NetCDF: NC_UNLIMITED size already in use in file pio_nc.c at line 2107
Abort with message NetCDF: NC_UNLIMITED size already in use in file pio_nc.c at line 2107
Abort with message NetCDF: NC_UNLIMITED size already in use in file pio_nc.c at line 2107
Abort with message NetCDF: NC_UNLIMITED size already in use in file pio_nc.c at line 2107
Abort with message NetCDF: NC_UNLIMITED size already in use in file pio_nc.c at line 2107
Abort with message NetCDF: NC_UNLIMITED size already in use in file pio_nc.c at line 2107
Abort with message NetCDF: NC_UNLIMITED size already in use in file pio_nc.c at line 2107
Obtained 10 stack frames.
/mnt/lustre/a2fs-work2/work/n02/n02/yuansun/privatemodules_packages/archer2/apps/gcc/pio2/2.6.2/lib/libpioc.so.5(print_trace+0x32) [0x150659c330f1]
/mnt/lustre/a2fs-work2/work/n02/n02/yuansun/privatemodules_packages/archer2/apps/gcc/pio2/2.6.2/lib/libpioc.so.5(piodie+0x77) [0x150659c331fe]
/mnt/lustre/a2fs-work2/work/n02/n02/yuansun/privatemodules_packages/archer2/apps/gcc/pio2/2.6.2/lib/libpioc.so.5(check_netcdf2+0x1f0) [0x150659c3354d]
/mnt/lustre/a2fs-work2/work/n02/n02/yuansun/privatemodules_packages/archer2/apps/gcc/pio2/2.6.2/lib/libpioc.so.5(check_netcdf+0x34) [0x150659c3335b]
/mnt/lustre/a2fs-work2/work/n02/n02/yuansun/privatemodules_packages/archer2/apps/gcc/pio2/2.6.2/lib/libpioc.so.5(PIOc_def_dim+0x363) [0x150659c260c3]
/mnt/lustre/a2fs-work2/work/n02/n02/yuansun/privatemodules_packages/archer2/apps/gcc/pio2/2.6.2/lib/libpiof.so.4(__pio_nf_MOD_def_dim_id+0xbc) [0x150659c6419e]
/mnt/lustre/a2fs-work2/work/n02/n02/yuansun/privatemodules_packages/archer2/apps/gcc/pio2/2.6.2/lib/libpiof.so.4(__pio_nf_MOD_def_dim_int_desc+0x54) [0x150659c642c1]
/work/n02/n02/yuansun/cesm/runs/lcz/bld/cesm.exe() [0x521538]
/work/n02/n02/yuansun/cesm/runs/lcz/bld/cesm.exe() [0x5bd002]
/work/n02/n02/yuansun/cesm/runs/lcz/bld/cesm.exe() [0x612d54]
MPICH ERROR [Rank 0] [job id 5388003.0] [Wed Jan 31 12:19:52 2024] [nid001240] - Abort(-1) (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0

then i run ./xmlquery -p PIO, the result is below:
yuansun@ln01:/work/n02/n02/yuansun/cesm/runs/lcz> ./xmlquery -p PIO

Results in group build_macros
PIO_CONFIG_OPTS:
PIO_VERSION: 2

Results in group case_last
PIO_SPEC_FILE: /mnt/lustre/a2fs-work2/work/n02/n02/yuansun/cesm/my_cesm_sandbox_2.3lcz/ccs_config/machines/config_pio.xml

Results in group run_pio
PIO_ASYNCIO_NTASKS: 0
PIO_ASYNCIO_ROOTPE: 1
PIO_ASYNCIO_STRIDE: 0
PIO_ASYNC_INTERFACE: ['CPL:FALSE', 'ATM:FALSE', 'LND:FALSE', 'ICE:FALSE', 'OCN:FALSE', 'ROF:FALSE', 'GLC:FALSE', 'WAV:FALSE', 'ESP:FALSE']
PIO_BLOCKSIZE: -1
PIO_BUFFER_SIZE_LIMIT: -1
PIO_DEBUG_LEVEL: 0
PIO_NETCDF_FORMAT: ['CPL:64bit_offset', 'ATM:64bit_offset', 'LND:64bit_offset', 'ICE:64bit_offset', 'OCN:64bit_offset', 'ROF:64bit_offset', 'GLC:64bit_offset', 'WAV:64bit_offset', 'ESP:64bit_offset']
PIO_NUMTASKS: ['CPL:-99', 'ATM:-99', 'LND:-99', 'ICE:-99', 'OCN:-99', 'ROF:-99', 'GLC:-99', 'WAV:-99', 'ESP:-99']
PIO_REARRANGER: ['CPL:2', 'ATM:1', 'LND:2', 'ICE:2', 'OCN:2', 'ROF:2', 'GLC:2', 'WAV:2', 'ESP:2']
PIO_REARR_COMM_ENABLE_HS_COMP2IO: TRUE
PIO_REARR_COMM_ENABLE_HS_IO2COMP: FALSE
PIO_REARR_COMM_ENABLE_ISEND_COMP2IO: FALSE
PIO_REARR_COMM_ENABLE_ISEND_IO2COMP: TRUE
PIO_REARR_COMM_FCD: 2denable
PIO_REARR_COMM_MAX_PEND_REQ_COMP2IO: -2
PIO_REARR_COMM_MAX_PEND_REQ_IO2COMP: 64
PIO_REARR_COMM_TYPE: p2p
PIO_ROOT: ['CPL:1', 'ATM:1', 'LND:1', 'ICE:1', 'OCN:1', 'ROF:1', 'GLC:1', 'WAV:1', 'ESP:1']
PIO_STRIDE: ['CPL:128', 'ATM:128', 'LND:128', 'ICE:128', 'OCN:128', 'ROF:128', 'GLC:128', 'WAV:128', 'ESP:128']
PIO_TYPENAME: ['CPL:pnetcdf', 'ATM:pnetcdf', 'LND:pnetcdf', 'ICE:pnetcdf', 'OCN:pnetcdf', 'ROF:pnetcdf', 'GLC:pnetcdf', 'WAV:pnetcdf', 'ESP:pnetcdf']

I checked the bld/pio.bldlog, it recorded below as:
pio_version_major = 2 pio_version = 2
Using installed PIO library
Updating valid_values for PIO_TYPENAME: netcdf, pnetcdf, netcdf4p,nothing

I test the code in another HPC with openmpi without parrall modules. I met the same error.

I am not sure what is wrong. Thank you very much for any comments.

Best,
Yuan
 
Top