I suggest looking at the
domain decomposition section of the POP reference manual, but basically each block in the parallel decomposition has a halo of two cells that are also computed on neighboring blocks; this allows the model to compute baroclinic tendencies without passing MPI messages between tasks (the increase in memory / computational costs is less than the increase in communcation costs would be if the model did not have a halo region).
edit to add: the halo update is an MPI communication step that is required to make sure the values in the halo region match what is in the neighboring blocks, but it's still much less communication between nodes than constructing the model without any halo cells.