Main menu

Navigation

cam5 FV dynamical core

15 posts / 0 new
Last post
dan.lipsa@...
cam5 FV dynamical core

Hi all,

I am trying to read the grid data on each compute nodes for the FV dynamical core and write it out. Also, find out information about the whole grid. What is the best place to start?

Also, a related question: If I am saving all attributes in the fv grid, do I get attributes from physics and chemistry modules? I am using cam-5.3 from cesm1_2_1.

Thank you very much,

Dan

jedwards

HI Dan,

 

I'm afraid that your question as written doesn't make a lot of sense to me - can you explain in more detail what you are trying to do.

CESM Software Engineer

dan.lipsa@...

Hi,

Thanks for your answer.

My goal is to write out (or more precisely send to our ParaView Catalyst) computation data produced by cam5. For this I need to understand and access the data decomposition used by various computation modules involved.

We are configuring our cam5 using: fv, cam5, trop_mam3 modules. Do each of these modules have different grid decompositions?

If we want to look at the physics module (I think most of the variables in the history file come from there) what is the best place to look to understand the grid decomposition?

Thank you very much,

Dan

hannay

I assume your are talking about the grid for CAM (the atmospheric model). 

The grid is set when you invoke the create_newcase command.

create_newcase -case [case name] \
-mach [machine name] \
-compset [compset name] \
-res [resolution]

Information about the supported is in:
http://www.cesm.ucar.edu/models/cesm1.2/cesm/doc/modelnl/grid.html

For CAM, you have teh choice between different dynamicla core and resolution.

If you want to see the grid once you have created a case, you could look at the initial condition files.
Look for the variable "ncdata" in the CAM namelist (atm_in), it will point you to the initial file    

 

 

santos

We are configuring our cam5 using: fv, cam5, trop_mam3 modules. Do each of these modules have different grid decompositions?

Just to clarify for this specific question, the answer is almost always no. For any specific run, it is safe to say only one grid is used by the whole atmosphere model, including physics, chemistry, and dynamics. This will probably not be true in the future, since the dynamics and column physics grids may be different in the future, and because of new and proposed features involving subcolumns and SPCAM that require a somewhat expanded concept of what the "grid" means. But for standard runs up through right now, you can assume that basically all of CAM is working on the same grid as is specified when you create the case, which should be the same as what's in the initial conditions file (ncdata).

(Also, "cam5" is not really a distinct module or package per se. It is more like a shorter way of saying "CAM with trop_mam3 as the aerosol/chemistry and RRTMG as the radiation and TKE as the PBL scheme and Park's macrophysics and MG microphysics and ZM deep convection and UW shallow convection and..." This standard combination is the most common and supported one, and is thus defined in the user interface, but it is often possible, though not supported by us, to mix and match pieces between CAM4, CAM5, and physics packages that are not part of any standard configuration.)

Sean Patrick Santos

CESM Software Engineering Group

dan.lipsa@...

Sean, thank you for your detailed answer.

I think I am a little more clear about where do I have to get the data from. I am looking in the physics module, and I am trying to understand the grid decomposition between MPI nodes and chunking that goes on there.

I am looking at physics_types::physics_state and I don't really understand how are the chunks formed: for a 2D array such as pressure (ps) the chunk is unidimensional, while for a 3D array (such as t) it is 2D. I also tried to compare the chunked data with the data stored in the history file. Is there a document that describes this?

 

Thank you

Dan

jedwards

CESM Software Engineer

jedwards

CESM Software Engineer

eaton

You are correct that most of the data written for analysis comes from the physics decomposition.  The Worley article is a great reference for understanding the design of that decomposition.  It is implemented in the phys_grid module.  That module gets its information about how the grid is defined from the dyn_grid module.  The dynamical core is always responsible for the definition of the grid.  Then depending on the phys_loadbalance setting phys_grid decides how to set up the chunks that comprise the physics decompostion.  The phys_grid module also contains the query functions that return the locations of each column in the chunk.  The columns in the chunks have no requirement to be located near one another.  This makes writing the data for a global field to disk directly from the chunk data structure a performance nightmare.  Consequently we use the pio layer to rearrange the data into the "io" decomposition which is much more amenable to disk writes.  It may also be advantageous to use this decomposition for communication with ParaView Catalyst.

dan.lipsa@...
Thank you very much for the link and the explanation. This helped explain the seemingly random rearrangement I was seeing. In our prototype we are going send data from each node to ParaView Catalyst as an unstructured grid. We do not want to rearrange the data. Other idea we explored was to send the data as a collection of structured grids (columns). Indeed the problem is that the columns are not next to each other. A possible optimization may be to build larger blocks out of all the chunks stored in a MPI node. Are you aware of any work trying this? Thank you.

Dan

jedwards

Hi Dan,

 ... build larger blocks out of all the chunks stored in a MPI node.

The upcoming pio2 library will do this.  

CESM Software Engineer

dan.lipsa@...
We would definitely want to try this! Do you have an approximate release date for pio2? Thank you

Dan

jedwards

Hopefully by the end of the year.   If I understand what you are trying to do, and I'm not sure yet that I do it would involve modifying the API a bit so that you can intercept the data between the rearrangement and the write steps.   We can discuss details via email if you like.  - Jim

CESM Software Engineer

dan.lipsa@...

Hi Jim,

The goal of ParaView Catalyst is to enable post-processing on the compute nodes. This contrasts with the traditional post-processing workflow: save derived data, move the data on a different machine, create visualizations with it. For this we link a slimmed down version of ParaView to the simulation code, pass the data on each compute node to the ParaView running on that node and perform post-processing using any VTK/ParaView pipeline. The pipeline is written in Python. We can save processed datasets, visualizations or we can send processed data to a remote ParaView.

The challenge for cam5 (fv, cam5, trop_mam3) is that computational load balance (chunking) that is performed by the physics module seems to imply spatial non-coherence between data stored on different compute nodes. (columns are not next to each other on one compute node).

Our current project is limited in goals (and funds) so I will not pursue the PIO route further at the moment.

Just out of curiosity though, why do you want to build larger blocks on the compute nodes? This will help us for visualization, but the physics module does not seem to need that.

Thank you,

Dan

santos

spatial non-coherence between data stored on different compute nodes

I should point out that this is something of an understatement; the most common load balancing strategies try to deliberately minimize spatial coherence, in order to improve the chances that each task has a mix of columns from "expensive" regions and those from "cheap" regions. E.g. day is more expensive than night, cloudy is more expensive than clear sky.

Sean Patrick Santos

CESM Software Engineering Group

Log in or register to post comments

Who's new

  • jwolff
  • tinna.gunnarsdo...
  • sarthak2235@...
  • eolivares@...
  • shubham.gandhi@...