UPDATE: Parallel I/O on non-NCAR clusters w/ CESM

bdobbins@gmail_com · Feb 23, 2012

Hi guys,

In case this is relevant to anyone else, the issue I was facing with extremely slow parallel I/O speeds on CESM appears to have an extremely simple solution - the addition of the -D_USE_FLOW_CONTROL flag to CPPDEFS in Macros..

This flag is a default for some of the larger systems (Franklin, Hopper, Kraken, etc.), but is not present in either the 'generic' files or even some large, site-specific ones like Pleiades. This is true in both the CESM 1.0.3 and 1.0.4 releases. Unless there is some potential harm from doing so (Jim?), I'd recommend this flag be added to the generic systems CPPDEFS for future releases.

Replacing that flag in CESM 1.0.3 reduced the I/O time at the end of each model month from ~800-1600s to a much better 75 - 110s, basically doubling the effective rate on a CAM5 physics runs and improving it by a factor >5x on a CAM4 physics run. In CESM 1.0.4, with improved PIO capabilities, it's even better, with the average I/O time on these steps down to 45-60s. This is, as of yet, without much tuning on the Lustre system - it's using the default directory setting of stripe-count=4 and stripe-size=1M.

I also noticed that the PIO directory from CESM 1.0.4, along with the 'calcdecomp.F90' file from the PIO 1.4.0 source code, can replace the CESM 1.0.3 PIO directory tree (${CESMROOT}/models/utils/pio) in full and give CESM 1.0.3 users the speed benefit of the updated code. This also gets around an issue with older Intel compilers (ex: release 2011.5.220) which seem to have problems with the nested modules in PIO. Later compilers, such as 2011.9.293, don't have this issue. We're going to run a quick validation of CESM 1.0.3 with the 1.0.4-based PIO code, plus the updated calcdecomp.F90 file, but as this is purely I/O-level changes, I don't expect any problems.

All told, depending on the physics options in use, we run between 2.5x - 6x faster now on 1-degree models, before any tuning of the PE layout.

Hope that helps someone else - it's always these small things that get you! - and thanks again to Jim for his help.

Cheers,
- Brian

(PS. I'd be interested if anyone has any insight into how the lack of 'flow control' hit us so hard - is this peculiar to, say, QLogic IB cards, which typically offload some of the processing of messages to the CPU? Or is it universally important at >=512 cores and few I/O writers?)

bdobbins@gmail_com · Feb 23, 2012

One more thought: It might also be useful to provide an example, in 'config_pes.xml', of setting PIO_TYPENAME and PIO_NUMTASKS. I'm happy to write up a few additions to the 'porting' section and additions to the Machines/ directory if it would be of any help?

If it's just us dealing with this, no worries.

jedwards · Feb 23, 2012

Hi Brian,

I've made a few additional changes to the latest PIO and backported it to work with cesm1_0_4. I was able to run pnetcdf in all components except land in the case that I looked at, haven't had a chance to deal with land yet.
You can get the backport from
http://parallelio.googlecode.com/svn/branch_tags/cesm1_0_4_pio1_4_1_tags/tag01/pio
and put it in place of your models/utils/pio directory. This update requires pnetcdf 1.2.0 or newer and I've tested it with cesm1_0_4 but not cesm1_0_3.

You then need to edit the env_run.xml file in your case directory - you can use the xmlchange tool as follows

./xmlchange -file env_run.xml -id OCN_PIO_NUMTASKS -val -99
./xmlchange -file env_run.xml -id OCN_PIO_TYPENAME -val pnetcdf
./xmlchange -file env_run.xml -id ATM_PIO_TYPENAME -val pnetcdf
./xmlchange -file env_run.xml -id CPL_PIO_TYPENAME -val pnetcdf
./xmlchange -file env_run.xml -id ICE_PIO_TYPENAME -val pnetcdf

- Jim

UPDATE: Parallel I/O on non-NCAR clusters w/ CESM

bdobbins@gmail_com

New Member

bdobbins@gmail_com

New Member

jedwards

CSEG and Liaisons