Experiences Porting CCSM to LLNL IA64 Cluster ‘Thunder’
Art Mirin - 12/20/04
Present code status: CCSM makes and executes on Thunder. Restart test succeeds. Case with FV dycore (2x2.5 grid) and GX1V3 ocean grid begins to run much more slowly in the sixth month; by the seventh month it is running 3 times as slow as at the beginning. This behavior does not occur on LLNL IBM Frost. A perturbation growth test (as with CAM) would be quite useful.
Processor configuration: Configuration for Frost and Cheetah has 16 tasks for atm, 4 for lnd, 4 for cpl, 32 for ocn and 8 for ice, for a total of 64. That configuration was used on Thunder, as well as one with (atm=96, lnd=16, cpl=8, ocn=96, ice=16 for total of 232). Tools to guide a logical choice of processor assignment are not apparent. Furthermore, the ocean and ice codes work with only a limited (and not well publicized) number of processor configurations.
Standard out: Because multiple tasks of a component model (particularly atm and lnd) write to standard out, some standard out data is lost. It would be useful to have an option to suppress output from all but the master process by renaming output file as /dev/null.
OpenMP: The IA64 compiler is still not properly supporting OpenMP. Compile and execute bugs have been encountered.
Job control: LLNL operating system requires uniform number of tasks per node. This in effect prevents use of OpenMP for atm and lnd, as other components would have to waste processors.
Code changes: Tar file with modified (and original) files is available on goldhill as /home/mirin/ccsm_ia64_changes_tar. Several changes are not related to IA64 but were made for other reasons or other platforms.
Created batch.linux.thunder, env.linux.thunder and run.linux.thunder and generalized check_machine.
In POP communicate.F, defined MPI_RL to be MPI_REAL4 (as with Cray-X1) instead of MPI_REAL. This is because real*8 promotion flag was also promoting MPI_REAL.
In POP tavg.F, inlined call to write_array_ncdf for 3D variables due to compiler bug (8.1 compiler).
Generalized Macros.Linux to accommodate Intel compiler. Needed to include flag to interpret binary record lengths to be in bytes. Included –mp option because that option is needed for CAM perturbation growth test to succeed.
Modified esmf.buildlib to default to Intel. This probably needs a machine-dependent option.
Generalized several ESMC files to include stdio.h.
Generalized one CAM timing file to include stdio.h, and another to include a return statement.
Modified CAM runtime.F90 and history.F90 to eliminate Equivalence statements.
Changed write(*,*) to write(6,*) in CAM fv_prints.F90.
Added flushing of standard out buffer in CAM FV stepon.F90 (not an IA64 issue).
Commented out call to sort_chunks in CAM phys_grid.F90 (this is a bug; it has nothing to do with IA64).
Modified cam.template to define precompile flag for FV dycore (not an IA64 issue).
Amended ccsm_msg.F90 to have proper coding for 2D FV decomposition (not an IA64 issue).
Fixed typo in shr_msg_mod.F90 (not an IA64 issue).
Modified a few files for cheetah32, frost and seaborg.
Art Mirin - 12/20/04
Present code status: CCSM makes and executes on Thunder. Restart test succeeds. Case with FV dycore (2x2.5 grid) and GX1V3 ocean grid begins to run much more slowly in the sixth month; by the seventh month it is running 3 times as slow as at the beginning. This behavior does not occur on LLNL IBM Frost. A perturbation growth test (as with CAM) would be quite useful.
Processor configuration: Configuration for Frost and Cheetah has 16 tasks for atm, 4 for lnd, 4 for cpl, 32 for ocn and 8 for ice, for a total of 64. That configuration was used on Thunder, as well as one with (atm=96, lnd=16, cpl=8, ocn=96, ice=16 for total of 232). Tools to guide a logical choice of processor assignment are not apparent. Furthermore, the ocean and ice codes work with only a limited (and not well publicized) number of processor configurations.
Standard out: Because multiple tasks of a component model (particularly atm and lnd) write to standard out, some standard out data is lost. It would be useful to have an option to suppress output from all but the master process by renaming output file as /dev/null.
OpenMP: The IA64 compiler is still not properly supporting OpenMP. Compile and execute bugs have been encountered.
Job control: LLNL operating system requires uniform number of tasks per node. This in effect prevents use of OpenMP for atm and lnd, as other components would have to waste processors.
Code changes: Tar file with modified (and original) files is available on goldhill as /home/mirin/ccsm_ia64_changes_tar. Several changes are not related to IA64 but were made for other reasons or other platforms.
Created batch.linux.thunder, env.linux.thunder and run.linux.thunder and generalized check_machine.
In POP communicate.F, defined MPI_RL to be MPI_REAL4 (as with Cray-X1) instead of MPI_REAL. This is because real*8 promotion flag was also promoting MPI_REAL.
In POP tavg.F, inlined call to write_array_ncdf for 3D variables due to compiler bug (8.1 compiler).
Generalized Macros.Linux to accommodate Intel compiler. Needed to include flag to interpret binary record lengths to be in bytes. Included –mp option because that option is needed for CAM perturbation growth test to succeed.
Modified esmf.buildlib to default to Intel. This probably needs a machine-dependent option.
Generalized several ESMC files to include stdio.h.
Generalized one CAM timing file to include stdio.h, and another to include a return statement.
Modified CAM runtime.F90 and history.F90 to eliminate Equivalence statements.
Changed write(*,*) to write(6,*) in CAM fv_prints.F90.
Added flushing of standard out buffer in CAM FV stepon.F90 (not an IA64 issue).
Commented out call to sort_chunks in CAM phys_grid.F90 (this is a bug; it has nothing to do with IA64).
Modified cam.template to define precompile flag for FV dycore (not an IA64 issue).
Amended ccsm_msg.F90 to have proper coding for 2D FV decomposition (not an IA64 issue).
Fixed typo in shr_msg_mod.F90 (not an IA64 issue).
Modified a few files for cheetah32, frost and seaborg.