cesm2.2.2 Issue while reading data with pnetcdf on vast file system

rambhari01

Ram
Member
Hi All,

We are trying to port CESM (cesm2.2.2) on a new HPC cluster at NYU, we are using it on other cluster where it is working good. But now when we are migrating it to new machine with almost all the previous setting related with compiler, netcdf, Pnetcdf etc.

On this new machine, we are using "PnetCDF 1.15.0 alpha".
File system is VAST file system, NFSV4 mounted
"vast-ib.torch.hpc.nyu.edu:/vast/hpc/torch/scratch/rs9552 on /scratch/rs9552 type nfs4 (rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,rdirplus=force,forcerdirplus,proto=rdma,nconnect=16,port=20049,timeo=600,retrans=2,sec=sys,clientaddr=10.0.4.6,local_lock=none,localports_failover,remoteports=10.0.5.149-10.0.6.20,addr=10.0.5.208)"

When we try to run a tested setup using the pnetcdf, it rans well and write the model output well with single node (1 Node; 128 processors).
But when we try to run with multiple nodes (2 node, 128 processors, 64 for each), the same setup compiles well but it gave strange error while reading the model input data as given below.
------------------------------------------------------------

proc= 127 clump no = 1 clump id= 128
beg cohort = 20850 end cohort = 21013
total cohorts per clump = 164

ENDRUN:
ERROR:
spvals or NaNs found in coordinate array where level_class /= ispval; this is c
urrently unhandled ERROR in initInterpMultilevelInterp.F90 at line 435

ENDRUN:
ERROR:
spvals or NaNs found in coordinate array where level_class /= ispval; this is c
urrently unhandled ERROR in initInterpMultilevelInterp.F90 at line 435


------------------------------------------------------------

I am attaching the softerware_environment.txt and run log here.

Could you please hint the possible cause and solution to this issue.

Thank you very much

-Ram
 

Attachments

rambhari01

Ram
Member
Thank you @slevis for moving this to appropriate forum.

Meanwhile, we also tried to run the same settings with 8 nodes and 96 processes on each node. We ran this with DEBUG=TRUE.

This time it gives an error while writing the output file and reading error( above post) doesn't appear now. I am attaching the new log file here (line 958330 onwards).


Thank you.
-Ram
 
Vote Upvote 0 Downvote

rambhari01

Ram
Member
Hi All,

Since we’re encountering issues using PnetCDF with CESM, is it possible to use netCDF4 (parallel) with CESM for PIO instead? How does its performance compare to PnetCDF?

Thanks

-Ram
 
Vote Upvote 0 Downvote

dobbins

Brian Dobbins
CSEG and Liaisons
Staff member
Hi Ram,

I think there are a number of possible things going on here. I've not personally used VAST or the alpha release of CESM 2.2. Is VAST also used on the old system? And, on the old system, you're using the same (alpha) version of PNetCDF, and the same version of NetCDF and HDF5 (eg, the 1.14-series)? I'm inclined to think this is an error somewhere in that subsystem of libraries.

One recommendation is to look into testing your IO subsystem with these libraries via IOR:
GitHub - hpc/ior: IOR and mdtest

By using that, with the same software environment, and using the -W/-R flags, you should be able to check if there's an issue on writes and reads.

To directly answer your question, you can use NetCDF4-parallel (./xmlchange PIO_TYPENAME=netcdf4p), but I strongly recommend against it, as the performance tends to be an order of magnitude worse, at least in some recent runs I've done with the CESM3 alpha tags.

Cheers,
- Brian
 
Vote Upvote 0 Downvote

jedwards

CSEG and Liaisons
Staff member
I think that the problem is right at the top of the error messages in your log:
GEN_create line 154: rank 5 failure to open file ./FHIST_CAM6CLM5BGC_NOcrop_CLMCO2Diag_00_T2.clm2.h1.1850-01-01-00000.nc (No such file or directory)

My guess is that the file or filesystem is not available on all nodes. DId you run the pnetcdf system tests on multiple nodes, I suspect that you will get a failure at that level as well.
 
Vote Upvote 0 Downvote

rambhari01

Ram
Member
Hi Brian, Edward and Slevis,

Thank you very much for your suggestions.

Update:-
We tried several other options with setup, Sometimes model runs well but it takes hours during the I/O for this particular file ".clm2.h1.*.nc", even though if we try to write this for monthly or annual scale. It seems that it is a compatibility issue with VAST file system and pnetcdf. We tried several other option but at the end, everything work well but writing and closing of this file took unexpectedly long (up to 5-6 hours) or sometimes model even stuck at this step although the queue status shows the model using cpu times as normally.

However, HPC help at NYU-HPC has figured out an alternate way, in which we are using the seriel netcdf for running the model and writing all the output at local node memory (say first node from several requested nodes). We tried to run the model using upto 4-5 nodes for computation but dumping the output data at local node memory at the desired frequency for these files. Surprisingly, with this setup model runs well without any issue, using 4 node (512 processes), we are able to simulate 1 year within ~2.5 hours and model writes all the file within few minutes.

i think this issue is with file system (VAST) and pnetcdf.

Thank you very much for your help.

-Ram
 
Vote Upvote 0 Downvote
One of our systems (NCHC Taiwania 3) happens to upgrade to VAST just few weeks ago and we're having problems too.
We've tested pnetcdf 1.8.1 and 1.12.1; When using Intel MPI 2021.11 and multiple nodes, the pnetcdf/test/fandc/pnf_test failed with:
ADIO_OPEN(522): open failed on a remote node
With Intel MPI on single node, the test runs but the writing speed is much slower than the older GPFS storage.

We have better luck with OpenMPI 5.0.3, though - the default configuration works, although still slower than GPFS, in both single and multiple nodes test.
However, preloading the VAST DirectIO library do bring the writing speed back to GPFS level.
The VAST library can be found here:
GitHub - vast-data/vast-preload-lib: LD_PRELOAD library to inject O_DIRECT into file I/O

We're still working on the Intel MPI though, since most of our team's environment modules and codes, including CESM 1.2 ~ recent 3.0 beta, uses it as default. So although the OpenMPI seems promising, we still like to cling to Intel MPI if possible.
The system admin and the VAST support is quite active; I'll update here if there are good news.
 
Vote Upvote 0 Downvote
Official responce and test result from VAST support:
1. Use environment ROMIO_FSTYPE_FORCE="ufs:" with Intel MPI 2021.10, and WITHOUT the VAST DirectIO library, is suggested for CESM pNetCDF I/O pattern. There's no bug and can achieve good performance on VAST.
2. The ROMIO module was changed in Intel MPI 2021.11. VAST tested and confirms that 2021.11 version is broken with VAST parallel IO error, and may silently introduce data corruption even if no error was printed. So just avoid this release.
3. The same test was conducted with Intel MPI 2021.17 (the current latest). It's also broken in most cases, but using the "ROMIO..." environment variable only (without other I_MPI... things) can make it work.
3. Older Intel MPI such as 2020u4 does not support the environment variable above, but you can use the VAST DirectIO library with it.
4. Depends on the batchsystem type / setup, the environment variable sometimes not passed to all MPI nodes. Consult the system administrator for details.
5. OpenMPI just offers better support with VAST, so feel free to choose / try.
 
Vote Upvote 0 Downvote

rambhari01

Ram
Member
Dear Mike,

We greatly appreciate your efforts and suggestions on this issue. We will certainly test this at our HPC and I will update you on how it goes.

Thanks a lot.

-Ram
 
Vote Upvote 0 Downvote
Back
Top