Hi,
I am trying to run the MOM6 with ocean only benchmark, the simulation seems to be getting stuck.
Compiler - intel2019u5
state of the directory where simulation got stuck-
contents of MOM_override file -
contents if input.nml -
slurm.out.txt
with 56 ranks (i have 56 cores per node), the simulation seems to be getting stuck at this state (stdout) -
attaching the stdout herewith. Rest other files are same as present in the source code. The same setup runs to completion with lower number of ranks (28). Do i need to modify some settings in the input file to make the simulation work with higher number of ranks (56)?
I tested few launch configurations -
a) 14 ranks x 1 thread per process
b) 14 ranks x 4 threads per process
c) ranks > 50 per node
with all above the simulation gets stuck at first timestep. with 28 ranks per node, 50 ranks per node the simulation works.
Please let me know in case some more information is required from my end on this issue.
I am trying to run the MOM6 with ocean only benchmark, the simulation seems to be getting stuck.
Compiler - intel2019u5
state of the directory where simulation got stuck-
Code:
ocean_only_benchmark1]$ ls test/
available_diags.000000 input.nml MOM6 MOM_parameter_doc.all ocean.stats time_stamp.out
change.sh input.nml.bak MOM_input MOM_parameter_doc.debugging ocean.stats.nc Vertical_coordinate.nc
CPU_stats input.nml.bk MOM_memory.h MOM_parameter_doc.layout RESTART
diag_table libnuma.so MOM_override MOM_parameter_doc.short run.sh
env.src logfile.000000.out MOM_override.bak mon run.sh.bak
GOLD_IC.nc lstopo.info MOM_override.bk ocean_geometry.nc slurm-86603.out
contents of MOM_override file -
Code:
ocean_only_benchmark1]$ cat test/MOM_override
! Blank file in which we can put "overrides" for parameters
#override NIGLOBAL = 720
#override NJGLOBAL = 360
contents if input.nml -
Code:
ocean_only_benchmark1]$ cat test/input.nml
&MOM_input_nml
output_directory = './',
input_filename = 'n'
restart_input_dir = 'INPUT/',
restart_output_dir = 'RESTART/',
parameter_filename = 'MOM_input',
'MOM_override' /
&diag_manager_nml
/
&fms_nml
clock_grain='ROUTINE'
clock_flags='SYNC'
domains_stack_size = 955296
stack_size =0 /
&ocean_solo_nml
months = 0
days = 20 /
slurm.out.txt
with 56 ranks (i have 56 cores per node), the simulation seems to be getting stuck at this state (stdout) -
Code:
MOM Date 1/01/01 00:00:00 0: En 5.483091E-01, MaxCFL 0.00000, Mass 7.909719100499E+19, Salt 35.00000000000, Temp 5.06383782258
Total Energy: 4402CF01460DFD5A 4.3369709224187347E+19
Total Mass: 7.9097191004994732E+19, Change: 0.0000000000000000E+00 Error: 0.00000E+00 ( 0.0E+00)
Total Salt: 2.7684016851748152E+18, Change: 0.0000000000000000E+00 Error: 0.00000E+00 ( 0.0E+00)
Total Heat: 1.5721012388222692E+24, Change: 0.0000000000000000E+00 Error: 0.00000E+00 ( 0.0E+00)
Total age: 0.0000000000000000E+00 yr kg
attaching the stdout herewith. Rest other files are same as present in the source code. The same setup runs to completion with lower number of ranks (28). Do i need to modify some settings in the input file to make the simulation work with higher number of ranks (56)?
I tested few launch configurations -
a) 14 ranks x 1 thread per process
b) 14 ranks x 4 threads per process
c) ranks > 50 per node
with all above the simulation gets stuck at first timestep. with 28 ranks per node, 50 ranks per node the simulation works.
Please let me know in case some more information is required from my end on this issue.