Experiences porting CCSM to LLNL IA64 'Thunder' cluster

mirin · Dec 21, 2004

Experiences Porting CCSM to LLNL IA64 Cluster â€˜Thunderâ€™

Art Mirin - 12/20/04

Present code status: CCSM makes and executes on Thunder. Restart test succeeds. Case with FV dycore (2x2.5 grid) and GX1V3 ocean grid begins to run much more slowly in the sixth month; by the seventh month it is running 3 times as slow as at the beginning. This behavior does not occur on LLNL IBM Frost. A perturbation growth test (as with CAM) would be quite useful.

Processor configuration: Configuration for Frost and Cheetah has 16 tasks for atm, 4 for lnd, 4 for cpl, 32 for ocn and 8 for ice, for a total of 64. That configuration was used on Thunder, as well as one with (atm=96, lnd=16, cpl=8, ocn=96, ice=16 for total of 232). Tools to guide a logical choice of processor assignment are not apparent. Furthermore, the ocean and ice codes work with only a limited (and not well publicized) number of processor configurations.

Standard out: Because multiple tasks of a component model (particularly atm and lnd) write to standard out, some standard out data is lost. It would be useful to have an option to suppress output from all but the master process by renaming output file as /dev/null.

OpenMP: The IA64 compiler is still not properly supporting OpenMP. Compile and execute bugs have been encountered.

Job control: LLNL operating system requires uniform number of tasks per node. This in effect prevents use of OpenMP for atm and lnd, as other components would have to waste processors.

Code changes: Tar file with modified (and original) files is available on goldhill as /home/mirin/ccsm_ia64_changes_tar. Several changes are not related to IA64 but were made for other reasons or other platforms.

Created batch.linux.thunder, env.linux.thunder and run.linux.thunder and generalized check_machine.

In POP communicate.F, defined MPI_RL to be MPI_REAL4 (as with Cray-X1) instead of MPI_REAL. This is because real*8 promotion flag was also promoting MPI_REAL.

In POP tavg.F, inlined call to write_array_ncdf for 3D variables due to compiler bug (8.1 compiler).

Generalized Macros.Linux to accommodate Intel compiler. Needed to include flag to interpret binary record lengths to be in bytes. Included â€“mp option because that option is needed for CAM perturbation growth test to succeed.

Modified esmf.buildlib to default to Intel. This probably needs a machine-dependent option.

Generalized several ESMC files to include stdio.h.

Generalized one CAM timing file to include stdio.h, and another to include a return statement.

Modified CAM runtime.F90 and history.F90 to eliminate Equivalence statements.

Changed write(*,*) to write(6,*) in CAM fv_prints.F90.

Added flushing of standard out buffer in CAM FV stepon.F90 (not an IA64 issue).

Commented out call to sort_chunks in CAM phys_grid.F90 (this is a bug; it has nothing to do with IA64).

Modified cam.template to define precompile flag for FV dycore (not an IA64 issue).

Amended ccsm_msg.F90 to have proper coding for 2D FV decomposition (not an IA64 issue).

Fixed typo in shr_msg_mod.F90 (not an IA64 issue).

Modified a few files for cheetah32, frost and seaborg.

gcarr@ucar_edu · Jan 19, 2005

Sounds like it might be a system issue. However, we do not yet have any IA64 systems on which we have CCSM3 running and validated. Our work on the SGI Altix is close. There have been a number of issues with the compiler and code.

mirin · Jan 29, 2005

The slowdown reported in the original message was due to underflows in the ice model. The 2x2.5_gx1v3 case on 232 processors now runs at about 95 computer minutes per simulated year. That case did run 44 years and we are seeing questionable ice behavior in Arctic region - similar to that reported by others with FV dycore.

Although I did my best to turn off most standard output from non-master task, code is experiencing high frequency of fatal errors which I believe is due to writes from multiple tasks calling cpl_interface_init (which is before the non-master output is turned off). Thunder file system is apparently sensitive.

I've created a new tar file with modifications, along with list of files changed and added (and original versions). Most of the CAM changes are already in the CAM development archive, and I assume you do not archive files relevant to unsupported machines (but be my guest), but there are still a few other relevant changes included. Tar file is on goldhill as ~mirin/ccsm_changes_012805_tar.

T85_gx1v3 run on Thunder is in progress but it is running at about 5 computer hours per simulated year.

mirin · Feb 18, 2005

Buried in my previous reply (1/29/05) was the fact that I had placed a new tarfile with ccsm changes (relative to rel04) on goldhill as ~mirin/ccsm_changes_012805_tar. Many (but certainly not all) of the changes relate to the IA64. Within that tarfile is an ascii file named "Listing", which contains a list of changed files and the purpose for the changes. I hope you will consider adopting whichever changes you deem relevant.

If anybody does query you with IA64 issues and you think I could be of use, I would be happy to help.

Regarding simulations: Our T85_gx1v3 (full components) validation test has run through 70 years and appears to be agreeing with NCAR's posted results.

Our FV 2x2.5_gx1v3 run is showing arctic ice buildup similar to previous NCAR investigations; we (John Taylor) are in contact with NCAR folk regarding how to proceed.

Experiences porting CCSM to LLNL IA64 'Thunder' cluster

mirin

New Member

gcarr@ucar_edu

New Member

mirin

New Member

mirin

New Member