Hi,
I am trying to run a MOM6 regional simulation in cheyenne.
I succesfully ran 1 whole year in another hpc using gnu compiler, and I am trying to run with different intel compilers in cheyenne.
However, when I started to run, I managed to run first 6 months without any problem. Then I had to resubmit the simulation because of wallclock issues.
The simulation started to fail at the same time (2 months after restart, 1996/08/16, 00-00-00) with the following error;
"MPT ERROR: MPI_COMM_WORLD rank 14 has terminated without calling MPI_Finalize() aborting job "
There was no other information.
After multiple different MOM_input options and different intel versions, somehow error log file produced this;
"MPT ERROR: Rank 30(g:30) is aborting with error code 1.
MPT Version: HPE MPT 2.19 02/23/19 05:30:09
MPT: --------stack traceback-------
FATAL from PE 1: NETCDF ERROR: NetCDF: HDF error File=INPUT/seawifs-clim-1997-2010.smoothed.nc Field=chlor_a"
That file was same in both machines. Neverthless since chlor_a field included NaN on the land points, just in case I decided to create a new
file with flooded on the land points.
I managed to run the second 6 months. And I tried to keep continue on the simulation, but the model stopped again with no error information at all!
The simulation was on the second year month 4 (1997/04/16, 00-00-00) .
Then I resubmit the simulation again this time with VERBOSITY = 6, and this time it stopped at a different time (1997/ 3/24 12:40: 0 ), no fatal error in the error log.
And last part of the output file was the following;
NOTE from PE 0: callTree: o done with find_uv_at_h (diabatic)
NOTE from PE 0: callTree: ---> set_diffusivity(), MOM_set_diffusivity.F90
NOTE from PE 0: callTree: o done with calculate_kappa_shear (set_diffusivity)
Has anybody have a suggestion?
P.S. These are my latest modules in cheyenne;
module load ncarenv
module load intel/19.1.1
module load netcdf/4.7.4
module load mpt/2.22
Thanks in advance,
Mehmet
I am trying to run a MOM6 regional simulation in cheyenne.
I succesfully ran 1 whole year in another hpc using gnu compiler, and I am trying to run with different intel compilers in cheyenne.
However, when I started to run, I managed to run first 6 months without any problem. Then I had to resubmit the simulation because of wallclock issues.
The simulation started to fail at the same time (2 months after restart, 1996/08/16, 00-00-00) with the following error;
"MPT ERROR: MPI_COMM_WORLD rank 14 has terminated without calling MPI_Finalize() aborting job "
There was no other information.
After multiple different MOM_input options and different intel versions, somehow error log file produced this;
"MPT ERROR: Rank 30(g:30) is aborting with error code 1.
MPT Version: HPE MPT 2.19 02/23/19 05:30:09
MPT: --------stack traceback-------
FATAL from PE 1: NETCDF ERROR: NetCDF: HDF error File=INPUT/seawifs-clim-1997-2010.smoothed.nc Field=chlor_a"
That file was same in both machines. Neverthless since chlor_a field included NaN on the land points, just in case I decided to create a new
file with flooded on the land points.
I managed to run the second 6 months. And I tried to keep continue on the simulation, but the model stopped again with no error information at all!
The simulation was on the second year month 4 (1997/04/16, 00-00-00) .
Then I resubmit the simulation again this time with VERBOSITY = 6, and this time it stopped at a different time (1997/ 3/24 12:40: 0 ), no fatal error in the error log.
And last part of the output file was the following;
NOTE from PE 0: callTree: o done with find_uv_at_h (diabatic)
NOTE from PE 0: callTree: ---> set_diffusivity(), MOM_set_diffusivity.F90
NOTE from PE 0: callTree: o done with calculate_kappa_shear (set_diffusivity)
Has anybody have a suggestion?
P.S. These are my latest modules in cheyenne;
module load ncarenv
module load intel/19.1.1
module load netcdf/4.7.4
module load mpt/2.22
Thanks in advance,
Mehmet