Hi All,
We are trying to port CESM (cesm2.2.2) on a new HPC cluster at NYU, we are using it on other cluster where it is working good. But now when we are migrating it to new machine with almost all the previous setting related with compiler, netcdf, Pnetcdf etc.
On this new machine, we are using "PnetCDF 1.15.0 alpha".
File system is VAST file system, NFSV4 mounted
"vast-ib.torch.hpc.nyu.edu:/vast/hpc/torch/scratch/rs9552 on /scratch/rs9552 type nfs4 (rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,rdirplus=force,forcerdirplus,proto=rdma,nconnect=16,port=20049,timeo=600,retrans=2,sec=sys,clientaddr=10.0.4.6,local_lock=none,localports_failover,remoteports=10.0.5.149-10.0.6.20,addr=10.0.5.208)"
When we try to run a tested setup using the pnetcdf, it rans well and write the model output well with single node (1 Node; 128 processors).
But when we try to run with multiple nodes (2 node, 128 processors, 64 for each), the same setup compiles well but it gave strange error while reading the model input data as given below.
------------------------------------------------------------
proc= 127 clump no = 1 clump id= 128
beg cohort = 20850 end cohort = 21013
total cohorts per clump = 164
ENDRUN:
ERROR:
spvals or NaNs found in coordinate array where level_class /= ispval; this is c
urrently unhandled ERROR in initInterpMultilevelInterp.F90 at line 435
ENDRUN:
ERROR:
spvals or NaNs found in coordinate array where level_class /= ispval; this is c
urrently unhandled ERROR in initInterpMultilevelInterp.F90 at line 435
------------------------------------------------------------
I am attaching the softerware_environment.txt and run log here.
Could you please hint the possible cause and solution to this issue.
Thank you very much
-Ram
We are trying to port CESM (cesm2.2.2) on a new HPC cluster at NYU, we are using it on other cluster where it is working good. But now when we are migrating it to new machine with almost all the previous setting related with compiler, netcdf, Pnetcdf etc.
On this new machine, we are using "PnetCDF 1.15.0 alpha".
File system is VAST file system, NFSV4 mounted
"vast-ib.torch.hpc.nyu.edu:/vast/hpc/torch/scratch/rs9552 on /scratch/rs9552 type nfs4 (rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,rdirplus=force,forcerdirplus,proto=rdma,nconnect=16,port=20049,timeo=600,retrans=2,sec=sys,clientaddr=10.0.4.6,local_lock=none,localports_failover,remoteports=10.0.5.149-10.0.6.20,addr=10.0.5.208)"
When we try to run a tested setup using the pnetcdf, it rans well and write the model output well with single node (1 Node; 128 processors).
But when we try to run with multiple nodes (2 node, 128 processors, 64 for each), the same setup compiles well but it gave strange error while reading the model input data as given below.
------------------------------------------------------------
proc= 127 clump no = 1 clump id= 128
beg cohort = 20850 end cohort = 21013
total cohorts per clump = 164
ENDRUN:
ERROR:
spvals or NaNs found in coordinate array where level_class /= ispval; this is c
urrently unhandled ERROR in initInterpMultilevelInterp.F90 at line 435
ENDRUN:
ERROR:
spvals or NaNs found in coordinate array where level_class /= ispval; this is c
urrently unhandled ERROR in initInterpMultilevelInterp.F90 at line 435
------------------------------------------------------------
I am attaching the softerware_environment.txt and run log here.
Could you please hint the possible cause and solution to this issue.
Thank you very much
-Ram