Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

How to diagnose causes of slow MOM6 run

cspencerjones

Spencer Jones
New Member
How do I get MOM6 to output more information about how long it is spending in each routine? We are running a very idealized configuration, but it is taking much longer than I expected, based on my experience with MITgcm. Can you give me (or link me to) some basic instructions for how to diagnose why the model is running slowly?
Thanks,
Spencer
 

marshallward

Marshall Ward
New Member
Most of the major subroutines have clock functions to track their time, which is printed at the end of the job. If you need to create new clocks, the general idea is to wrap the function with `cpu_clock_begin` and `cpu_clock_end`. You will also need to register the clock with `cpu_clock_id`. (There are several examples in MOM.F90.)
 

cspencerjones

Spencer Jones
New Member
Thanks Marshall. I think I have missed something basic here though...even when we turn the verbosity up to 7, we only get these estimates for "Total runtime", "Initialization", "Main loop" and "Termination" in the log file and output file. How do I turn on the option for more detailed output from existing clocks? (e.g. the time inside the dynamics step - a CPU clock already exists for this but I don't see any output about it).

Of course I could dig through the FMS code but it seems likely you will know the answer - thanks in advance
 

marshallward

Marshall Ward
New Member
I think you also need to turn on something at the FMS level. Try adding this to input.nml:
Code:
&fms_nml
    clock_grain='ROUTINE'
    clock_flags='NONE'
/
 

marshallward

Marshall Ward
New Member
I should have also explained: the default clocks in MOM6 usually have `grain` set to `CLOCK_COMPONENT` or something like that, which needs `clock_grain` set to something higher than default. (I have not looked at any of this in a long time, but I think they are just names for levels of verbosity.)

I think if you had just done `id = cpu_clock_id("clock label")` without a `grain` argument, then it would have shown up in the timings without any changes to `input.nml`.
 

ezaron

Ed Zaron
New Member
Hi Spencer:

When I run in ocean_only mode, I get a table like this at the end of the text output file:

Tabulating mpp_clock statistics across 1840 PEs...

tmin tmax tavg tstd tfrac grain pemin pemax
Total runtime 18240.085735 18240.242910 18240.160374 0.036598 1.000 0 0 1839
Initialization 9.467694 10.318431 10.248281 0.190500 0.001 0 0 1839
Main loop 18229.249874 18229.406791 18229.324390 0.036576 0.999 0 0 1839
Termination 0.290624 0.291347 0.291331 0.000017 0.000 0 0 1839
Ocean Initialization 9.451141 10.335998 10.252385 0.193435 0.001 11 0 1839
Ocean 18222.510332 18226.866710 18224.891874 0.879059 0.999 1 0 1839
Ocean dynamics 11519.332926 14866.182238 12790.039357 811.342920 0.701 11 0 1839
Ocean thermodynamics and tracers 1829.563460 6314.543754 4389.753119 987.488917 0.241 11 0 1839
Ocean Other 273.251121 3478.066334 1017.036702 701.371024 0.056 11 0 1839
(Ocean tracer advection) 521.829391 2016.799309 744.876752 248.712894 0.041 21 0 1839
(Ocean diabatic driver) 875.039626 5639.394206 3644.863160 1076.762620 0.200 21 0 1839
(Ocean continuity equation *) 0.000000 0.000000 0.000000 0.000000 0.000 31 0 1839
(Ocean set BBL viscosity) 34.075681 566.105717 155.101763 96.666000 0.009 31 0 1839
(Ocean message passing *) 111.283780 4574.458016 1974.742535 1054.149127 0.108 31 0 1839
(Ocean MOM_initialize_state) 6.171299 6.455355 6.428999 0.037264 0.000 31 0 1839
(Ocean init message passing *) 0.008429 0.163831 0.038410 0.036156 0.000 41 0 1839
(Ocean thickness diffusion *) 687.540333 937.084059 799.151873 46.768668 0.044 31 0 1839
(Ocean mixed layer restrat) 56.148337 1047.943808 597.451645 408.745057 0.033 31 0 1839
(Ocean collective diagnostics) 71.466022 284.709861 91.470177 31.516023 0.005 31 0 1839
(Ocean Z-space diagnostics) 0.000000 0.000000 0.000000 0.000000 0.000 31 0 1839
(Ocean ALE) 35.761366 1540.118358 794.546611 594.434898 0.044 31 0 1839
...

There are perhaps 80 lines total. Are you saying you don't get an output table like this?

Perhaps your run is taking so long that it times out and the full log file is not produced.

-Ed
 

cspencerjones

Spencer Jones
New Member
Hi Ed! Sorry I should have come back to say that Marshall's second solution worked. Once we added those lines to input.nml, we were able to see this output and debug.
 
Top