Scheduled Downtime
On Wednesday 09 March 2022 from 6am to 10am MT, the website will be down for maintenance

MOM6 (with ocean only benchmark) gets stuck


New Member
I am trying to run the MOM6 with ocean only benchmark, the simulation seems to be getting stuck.
Compiler - intel2019u5
state of the directory where simulation got stuck-

ocean_only_benchmark1]$ ls test/
available_diags.000000 input.nml MOM6 MOM_parameter_doc.all ocean.stats time_stamp.out input.nml.bak MOM_input MOM_parameter_doc.debugging
CPU_stats input.nml.bk MOM_memory.h MOM_parameter_doc.layout RESTART
diag_table MOM_override MOM_parameter_doc.short
env.src logfile.000000.out MOM_override.bak mon MOM_override.bk slurm-86603.out

contents of MOM_override file -
ocean_only_benchmark1]$ cat test/MOM_override
! Blank file in which we can put "overrides" for parameters
#override NIGLOBAL = 720
#override NJGLOBAL = 360

contents if input.nml -

ocean_only_benchmark1]$ cat test/input.nml
output_directory = './',
input_filename = 'n'
restart_input_dir = 'INPUT/',
restart_output_dir = 'RESTART/',
parameter_filename = 'MOM_input',
'MOM_override' /


domains_stack_size = 955296
stack_size =0 /

months = 0
days = 20 /


with 56 ranks (i have 56 cores per node), the simulation seems to be getting stuck at this state (stdout) -

MOM Date 1/01/01 00:00:00 0: En 5.483091E-01, MaxCFL 0.00000, Mass 7.909719100499E+19, Salt 35.00000000000, Temp 5.06383782258
Total Energy: 4402CF01460DFD5A 4.3369709224187347E+19
Total Mass: 7.9097191004994732E+19, Change: 0.0000000000000000E+00 Error: 0.00000E+00 ( 0.0E+00)
Total Salt: 2.7684016851748152E+18, Change: 0.0000000000000000E+00 Error: 0.00000E+00 ( 0.0E+00)
Total Heat: 1.5721012388222692E+24, Change: 0.0000000000000000E+00 Error: 0.00000E+00 ( 0.0E+00)
Total age: 0.0000000000000000E+00 yr kg

attaching the stdout herewith. Rest other files are same as present in the source code. The same setup runs to completion with lower number of ranks (28). Do i need to modify some settings in the input file to make the simulation work with higher number of ranks (56)?

I tested few launch configurations -
a) 14 ranks x 1 thread per process
b) 14 ranks x 4 threads per process
c) ranks > 50 per node
with all above the simulation gets stuck at first timestep. with 28 ranks per node, 50 ranks per node the simulation works.

Please let me know in case some more information is required from my end on this issue.


  • slurm.out.txt
    13.2 KB · Views: 0


Marshall Ward
New Member
Hi Puneet, it seems that this was an issue with the code. We have discussed this on GitHub, but I'll repeat the explanation here for others.

There are certain timers which are synced via `MPI_Barrier`. A few of these are embedded in j-loops.

If the ranks have different sized domains in j, then the domain will be unable to sync for iterations which exceed the size of the smaller ranks.

This can be resolved by removing the `clock_flags` argument from input.nml:

    !clock_flags = 'SYNC'