Main menu

Navigation

MAX_TASKS_PER_NODE on Mira

11 posts / 0 new
Last post
jacob@...
MAX_TASKS_PER_NODE on Mira

Does anyone know why MAX_TASKS_PER_NODE on Mira is set to 48 instead of the hardware maximum 64 or a power-of-two? Thanks.

Reference: http://www.cesm.ucar.edu/models/cesm1.2/cesm/doc/modelnl/machines.html

jedwards

We started with a setting of 48 because 64 was causing memory issues for some components.   We are now using 64 (with 8 tasks per node and 8 threads per task) for our production F-compset runs there, but since all of the pe layouts in config_pes.xml were based on that value of 48 we haven't changed the default.     

CESM Software Engineer

worleyph@...

Follow-on question (this location suggested by Jim) ...

In config_machines.xml the mira entry has

         <MAX_TASKS_PER_NODE>48</MAX_TASKS_PER_NODE>

and PES_PER_NODE is not specified. In contrast, for edison both are defined, as

        <MAX_TASKS_PER_NODE>48</MAX_TASKS_PER_NODE>
        <PES_PER_NODE>24</PES_PER_NODE>

so the first is the max number of hyperthreads and the second is the number of physical cores.

At the moment, the default on mira sets PES_PER_NODE to $MAX_TASKS_PER_NODE. What is the implication of doing this? Should the mira entry be changed to

        <MAX_TASKS_PER_NODE>48</MAX_TASKS_PER_NODE>
        <PES_PER_NODE>16</PES_PER_NODE>

? (I understand that setting max_tasks_per_node to 48 instead of 64 is related to the memory requirements in some build configurations, but should the physical number of cores be set accurately? Does this matter?)

Jim already responded to a direct inquiry (after which he reminded me that I should be using this forum for such questions), indicating that "Since mira isn't using the MAX_TASKS_PER_NODE variable the PES_PER_NODE variable provides the limiting functionality" for determining the maximum number of tasks per node. Apparently mkbatch.mira is the only script that uses these settings? Is this correct?


Thanks.

worleyph@...

FYI, with MAX_TASKS_PER_NODEset to 48 on Mira, neither pure MPI nor running with 2 OpenMP threads works. The system complains that 48 and 24 tasks per node is not supported. This happens at job launch, so is not a model error, but also is not reported when the job is submitted. In this case, setting the number of threads to 3 works (so that there are now 16 MPI tasks per node).

For example (pure MPI):

2014-08-27 23:05:30.931 (INFO ) [0xfff7aa4c330] ibm.runjob.AbstractOptions: using properties file /bgsys/local/etc/bg.properties
48 is not a valid ranks per node value
in option 'ranks-per-node': invalid option value '48'

(two threads per task)

2014-08-28 15:49:35.620 (INFO ) [0xfff7e4ac330] ibm.runjob.AbstractOptions: using properties file /bgsys/local/etc/bg.properties
24 is not a valid ranks per node value
in option 'ranks-per-node': invalid option value '24'

jedwards

As I mentioned earlier we are successfully running with MAX_TASKS_PER_NODE=64 for some compsets and resolutions.    The best thing to do in practice is

to tune this value and the pe layout for the best performance of the partcular compset and resolution you are interested in running.  

CESM Software Engineer

worleyph@...

Understood, and I (and others) do this as a matter of course. However this particular error message was not intuitive for the person I was working with and I just wanted to point this out. (He thought that the message was coming from the model, that this number of MPI tasks was not supported by a particular component.)

Just wanted to mention this as a potential issue for users on Mira, especially since this came from the default env_mach_pes.xml file for the particular case. I.e., it did not work out of the box.

Perhaps resetting the maximum tasks to either 16, 32, or 64 would be best. 32 might be a reasonable compromise, with people changing this to 64 when they know that memory will not be an issue.

mickelso@...

Which compset and resolution created the bad pe layout/machine mapping?

worleyph@...

As far as I can tell ANY pure MPI pe layout will not work when MAX_TASKS_PER_NODE is set to 48. The particular one this showed up in is

-compset ICRUCLM45 -res f09_f09

which generates a really weird layout (71 tasks for the land?). I changed this to something more reasonable, but it did not eliminate the problem until I also set the number of threads to 3.

santos

I'm somewhat puzzled. Are you saying that you cannot turn off threading on Mira without changing the MAX_TASKS_PER_NODE, or that a completely pristine, out-of-the-box env_mach_pes is not working for some configuration? I have no particular insight into the performance implications of MAX_TASKS_PER_NODE, but I run out-of-the-box test cases pretty regularly on Mira and Cetus and have not had any issues.

Sean Patrick Santos

CESM Software Engineering Group

worleyph@...

I'm saying that I have one example where an out-of-the-box env_mach_pes.xml failed when using MAX_TASKS_PER_NODE = 48. This particular PE layout was "unusual", so I modified it to something more reasonable for the Mira node architecture (in my opinion). In both bases  I was using 1 thread per task. (This was built with THREADED = .TRUE. for other reasons - a different bug report, but this appears to be irrelevant here.) Both runs failed at job launch where the "system" stated that 48 MPI tasks per node was not supported on Mira.

I changed the number of threads from 1 to 2 (same task requests) and the job launch then failed with an error message that 24 MPI tasks per node is not supported on Mira.

I then changed this to 3 threads per task, and it worked. My conjecture is that the number of MPI tasks has to be <= 16 or a multiple of 16, and that setting MAX_TASKS_PER_NODE = 48 is not the best choice for the naive user on Mira, especially if some of the default env_mach_pe.xml configurations are MPI-only.

Perhaps we should communicate directly if I am still not being clear.

 

santos

I've reproduced the issue now with this I compset. The assumption behind setting MAX_TASKS_PER_NODE to 48 seems to have been that the default thread count should always be a multiple of 3. But for I compsets specifically, threading is disabled for performance reasons, under the assumption that the NTHRDS_* variables can always be changed independently of other variables. This is obviously false on Mira, where the number of MPI tasks has to be a power of 2 (not just a multiple of 16, otherwise 48=3*16 would work).

I presume that this issue has gone unnoticed because most people running on Mira are interested in the atmosphere, and thus running B or F compsets. If those compsets work, there's rarely any reason why an I compset would fail; this case is obviously one of the rare exceptions. That would also be the reason for the "weird" task counts like 71. This value is part of a layout that was explicitly entered as one of the defaults, I presume based on the empirically determined best count for the scaling of a B compset, where the CLM and CICE are usually concurrent and may share part of a node.

Sean Patrick Santos

CESM Software Engineering Group

Log in or register to post comments

Who's new

  • jwolff
  • tinna.gunnarsdo...
  • sarthak2235@...
  • eolivares@...
  • shubham.gandhi@...