cam5 FV dynamical core

jedwards · Nov 1, 2014

This may help...http://www.csm.ornl.gov/~bbd/pubs/IJHPCAWorley2005.pdf

jedwards · Nov 1, 2014

This may help...http://www.csm.ornl.gov/~bbd/pubs/IJHPCAWorley2005.pdf

jedwards · Nov 1, 2014

This may help...http://www.csm.ornl.gov/~bbd/pubs/IJHPCAWorley2005.pdf

eaton · Nov 1, 2014

You are correct that most of the data written for analysis comes from the physics decomposition. The Worley article is a great reference for understanding the design of that decomposition. It is implemented in the phys_grid module. That module gets its information about how the grid is defined from the dyn_grid module. The dynamical core is always responsible for the definition of the grid. Then depending on the phys_loadbalance setting phys_grid decides how to set up the chunks that comprise the physics decompostion. The phys_grid module also contains the query functions that return the locations of each column in the chunk. The columns in the chunks have no requirement to be located near one another. This makes writing the data for a global field to disk directly from the chunk data structure a performance nightmare. Consequently we use the pio layer to rearrange the data into the "io" decomposition which is much more amenable to disk writes. It may also be advantageous to use this decomposition for communication with ParaView Catalyst.

eaton · Nov 1, 2014

You are correct that most of the data written for analysis comes from the physics decomposition. The Worley article is a great reference for understanding the design of that decomposition. It is implemented in the phys_grid module. That module gets its information about how the grid is defined from the dyn_grid module. The dynamical core is always responsible for the definition of the grid. Then depending on the phys_loadbalance setting phys_grid decides how to set up the chunks that comprise the physics decompostion. The phys_grid module also contains the query functions that return the locations of each column in the chunk. The columns in the chunks have no requirement to be located near one another. This makes writing the data for a global field to disk directly from the chunk data structure a performance nightmare. Consequently we use the pio layer to rearrange the data into the "io" decomposition which is much more amenable to disk writes. It may also be advantageous to use this decomposition for communication with ParaView Catalyst.

eaton · Nov 1, 2014

You are correct that most of the data written for analysis comes from the physics decomposition. The Worley article is a great reference for understanding the design of that decomposition. It is implemented in the phys_grid module. That module gets its information about how the grid is defined from the dyn_grid module. The dynamical core is always responsible for the definition of the grid. Then depending on the phys_loadbalance setting phys_grid decides how to set up the chunks that comprise the physics decompostion. The phys_grid module also contains the query functions that return the locations of each column in the chunk. The columns in the chunks have no requirement to be located near one another. This makes writing the data for a global field to disk directly from the chunk data structure a performance nightmare. Consequently we use the pio layer to rearrange the data into the "io" decomposition which is much more amenable to disk writes. It may also be advantageous to use this decomposition for communication with ParaView Catalyst.

eaton · Nov 1, 2014

You are correct that most of the data written for analysis comes from the physics decomposition. The Worley article is a great reference for understanding the design of that decomposition. It is implemented in the phys_grid module. That module gets its information about how the grid is defined from the dyn_grid module. The dynamical core is always responsible for the definition of the grid. Then depending on the phys_loadbalance setting phys_grid decides how to set up the chunks that comprise the physics decompostion. The phys_grid module also contains the query functions that return the locations of each column in the chunk. The columns in the chunks have no requirement to be located near one another. This makes writing the data for a global field to disk directly from the chunk data structure a performance nightmare. Consequently we use the pio layer to rearrange the data into the "io" decomposition which is much more amenable to disk writes. It may also be advantageous to use this decomposition for communication with ParaView Catalyst.

eaton · Nov 1, 2014

You are correct that most of the data written for analysis comes from the physics decomposition. The Worley article is a great reference for understanding the design of that decomposition. It is implemented in the phys_grid module. That module gets its information about how the grid is defined from the dyn_grid module. The dynamical core is always responsible for the definition of the grid. Then depending on the phys_loadbalance setting phys_grid decides how to set up the chunks that comprise the physics decompostion. The phys_grid module also contains the query functions that return the locations of each column in the chunk. The columns in the chunks have no requirement to be located near one another. This makes writing the data for a global field to disk directly from the chunk data structure a performance nightmare. Consequently we use the pio layer to rearrange the data into the "io" decomposition which is much more amenable to disk writes. It may also be advantageous to use this decomposition for communication with ParaView Catalyst.

eaton · Nov 1, 2014

You are correct that most of the data written for analysis comes from the physics decomposition. The Worley article is a great reference for understanding the design of that decomposition. It is implemented in the phys_grid module. That module gets its information about how the grid is defined from the dyn_grid module. The dynamical core is always responsible for the definition of the grid. Then depending on the phys_loadbalance setting phys_grid decides how to set up the chunks that comprise the physics decompostion. The phys_grid module also contains the query functions that return the locations of each column in the chunk. The columns in the chunks have no requirement to be located near one another. This makes writing the data for a global field to disk directly from the chunk data structure a performance nightmare. Consequently we use the pio layer to rearrange the data into the "io" decomposition which is much more amenable to disk writes. It may also be advantageous to use this decomposition for communication with ParaView Catalyst.

dan_lipsa@kitware_com · Nov 3, 2014

Thank you very much for the link and the explanation. This helped explain the seemingly random rearrangement I was seeing. In our prototype we are going send data from each node to ParaView Catalyst as an unstructured grid. We do not want to rearrange the data. Other idea we explored was to send the data as a collection of structured grids (columns). Indeed the problem is that the columns are not next to each other. A possible optimization may be to build larger blocks out of all the chunks stored in a MPI node. Are you aware of any work trying this? Thank you.

dan_lipsa@kitware_com · Nov 3, 2014

Thank you very much for the link and the explanation. This helped explain the seemingly random rearrangement I was seeing. In our prototype we are going send data from each node to ParaView Catalyst as an unstructured grid. We do not want to rearrange the data. Other idea we explored was to send the data as a collection of structured grids (columns). Indeed the problem is that the columns are not next to each other. A possible optimization may be to build larger blocks out of all the chunks stored in a MPI node. Are you aware of any work trying this? Thank you.

dan_lipsa@kitware_com · Nov 3, 2014

Thank you very much for the link and the explanation. This helped explain the seemingly random rearrangement I was seeing. In our prototype we are going send data from each node to ParaView Catalyst as an unstructured grid. We do not want to rearrange the data. Other idea we explored was to send the data as a collection of structured grids (columns). Indeed the problem is that the columns are not next to each other. A possible optimization may be to build larger blocks out of all the chunks stored in a MPI node. Are you aware of any work trying this? Thank you.

dan_lipsa@kitware_com · Nov 3, 2014

Thank you very much for the link and the explanation. This helped explain the seemingly random rearrangement I was seeing. In our prototype we are going send data from each node to ParaView Catalyst as an unstructured grid. We do not want to rearrange the data. Other idea we explored was to send the data as a collection of structured grids (columns). Indeed the problem is that the columns are not next to each other. A possible optimization may be to build larger blocks out of all the chunks stored in a MPI node. Are you aware of any work trying this? Thank you.

dan_lipsa@kitware_com · Nov 3, 2014

Thank you very much for the link and the explanation. This helped explain the seemingly random rearrangement I was seeing. In our prototype we are going send data from each node to ParaView Catalyst as an unstructured grid. We do not want to rearrange the data. Other idea we explored was to send the data as a collection of structured grids (columns). Indeed the problem is that the columns are not next to each other. A possible optimization may be to build larger blocks out of all the chunks stored in a MPI node. Are you aware of any work trying this? Thank you.

dan_lipsa@kitware_com · Nov 3, 2014

Thank you very much for the link and the explanation. This helped explain the seemingly random rearrangement I was seeing. In our prototype we are going send data from each node to ParaView Catalyst as an unstructured grid. We do not want to rearrange the data. Other idea we explored was to send the data as a collection of structured grids (columns). Indeed the problem is that the columns are not next to each other. A possible optimization may be to build larger blocks out of all the chunks stored in a MPI node. Are you aware of any work trying this? Thank you.

jedwards · Nov 3, 2014