Main menu

Navigation

porting CCSM3 code from yellowstone to cheyenne

9 posts / 0 new
Last post
xiaoyihf@...
porting CCSM3 code from yellowstone to cheyenne

Hi CESM Software Working Group,

I'm trying to port CCSM3 from yellowstone to cheyenne and I can get the model configured, but the model fails to build.

I documented what I did below.  It seems that additional changes need to be made in file modules.cheyenne. I would like to get some advice on this. 

 

The version of CCSM3 I'm trying to port is at /glade/p/cesm/cseg/releases/ccsm3_0_1_beta33

I was following the instruction---6.10 Adding a new machine to $CCSMROOT/ from the CCSM guide below:

http://www.cesm.ucar.edu/models/ccsm3.0/ccsm/doc/UsersGuide/UsersGuide/n...

Here is what I did:

 

  • edit $CCSMROOT/ccsm_utils/Tools/check_machine and add the cheyenne to the list named ``resok''. 

  • cd $CCSMROOT/scripts/ccsm_utils/Machines/ and copy yellowstone specific files into cheyenne specific files. 
    • cd $CCSMROOT/scripts/ccsm_utils/Machines/
    • cp env.linux.yellowstone env.linux.cheyenne
    • cp run.linux.yellowstone run.linux.cheyenne
    • cp batch.linux.yellowstone batch.linux.cheyenne            and "set mach = cheyenne"
    • cp l_archive.linux.yellowstone l_archive.linux.cheyenne           and "set mach = cheyenne"
    • For the modules, I copied that from the cesm1_2_2_1 (cesm1_2_2 for cheyenne)
    • cp $CCSMROOT_of_cesm1_2_2_1/scripts/ccsm_utils/Machines/env_mach_specific.cheyenne         modules.cheyenne
    • I also needs to revise modules.cheyenne to get it working as much as possible (I attached the revised file modules.cheyenne below)
    • I also copied the file "Macros" from one of my cheyenne cesm1_2_2_1 simulation to $CCSMROOT/models/bld/Macros.Linux (which I attached below)
    • cp $My_cesm1_2_1_1_CASEROOT/Macros           $CCSMROOT/models/bld/Macros.Linux

With all these modifications, I can configure CCSM3, but when I build the model, the model passes esmf, but failed to build for mph

> b30.CHE.cheyenne.build

sourcing modules.cheyenne

-------------------------------------------------------------------------

 Preparing component models for execution 

-------------------------------------------------------------------------

 - Create execution directories for atm,cpl,lnd,ice,ocn

 - If a restart run then copy restart files into executable directory 

ccsm_getrestart: get /glade/scratch/fenghe/b30.CHE restarts from /glade/scratch/fenghe/archive/b30.CHE/restart

 - Check validity of configuration

 - Determine if build must happen (env variable BLDTYPE)

 - Build flag (BLDTYPE) is TRUE

 - Build Libraries: esmf, mph, mct

Thu May 25 19:00:28 MDT 2017 esmf.buildlib.170525-190028

Thu May 25 19:00:33 MDT 2017 mph.buildlib.170525-190028

ERROR: mph.buildlib failed, see mph.buildlib.170525-190028

 

ERROR: cat /glade/scratch/fenghe/b30.CHE/mph/mph.buildlib.170525-190028

 

The error message for mph is copied below

 

Thu May 25 19:00:33 MDT 2017 mph.buildlib.170525-190028

icc -E  -DFORTRANUNDERSCORE -DNO_R16 -DLINUX -DCPRINTEL    mph.F > mph.f 

ifort  -no-opt-dynamic-align  -fp-model source -convert big_endian -assume byterecl -ftz -traceback -assume realloc_lhs  -O2  -fixed -132    mph.f  

ifort: command line remark #10411: option '-no-opt-dynamic-align' is deprecated and will be removed in a future release. Please use the replacement option '-qno-opt-dynamic-align'

/glade/u/apps/opt/intel/2017u1/compilers_and_libraries_2017.1.132/linux/compiler/lib/intel64_lin/for_main.o: In function `main':

for_main.c:(.text+0x2a): undefined reference to `MAIN__'

/glade/scratch/fenghe/ifortrjZdEe.o: In function `mph_module_mp_mph_components_':

mph.f:(.text+0xa06): undefined reference to `mpi_init_'

mph.f:(.text+0xa1c): undefined reference to `mpi_comm_dup_'

mph.f:(.text+0xa32): undefined reference to `mpi_comm_rank_'

mph.f:(.text+0xa48): undefined reference to `mpi_comm_size_'

mph.f:(.text+0x1119): undefined reference to `mpi_type_struct_'

mph.f:(.text+0x112a): undefined reference to `mpi_type_commit_'

mph.f:(.text+0x11be): undefined reference to `mpi_comm_split_'

mph.f:(.text+0x13c6): undefined reference to `mpi_allgatherv_'

mph.f:(.text+0x13fa): undefined reference to `mpi_bcast_'

/glade/scratch/fenghe/ifortrjZdEe.o: In function `mph_module_mp_mph_local_':

mph.f:(.text+0x1da7): undefined reference to `mpi_comm_split_'

mph.f:(.text+0x1dca): undefined reference to `mpi_comm_rank_'

mph.f:(.text+0x1ded): undefined reference to `mpi_comm_size_'

mph.f:(.text+0x1f36): undefined reference to `mpi_gather_'

mph.f:(.text+0x222e): undefined reference to `mpi_comm_split_'

mph.f:(.text+0x2245): undefined reference to `mpi_comm_rank_'

mph.f:(.text+0x2287): undefined reference to `mpi_comm_dup_'

mph.f:(.text+0x22aa): undefined reference to `mpi_comm_rank_'

mph.f:(.text+0x22cd): undefined reference to `mpi_comm_size_'

/glade/scratch/fenghe/ifortrjZdEe.o: In function `mph_module_mp_mph_global_':

mph.f:(.text+0x298f): undefined reference to `mpi_comm_split_'

mph.f:(.text+0x2b92): undefined reference to `mpi_allgatherv_'

mph.f:(.text+0x2bc6): undefined reference to `mpi_bcast_'

/glade/scratch/fenghe/ifortrjZdEe.o: In function `mph_module_mp_mph_comm_join_':

mph.f:(.text+0x3952): undefined reference to `mpi_comm_split_'

/glade/scratch/fenghe/ifortrjZdEe.o: In function `mph_module_mp_mph_timer_':

mph.f:(.text+0x4dc5): undefined reference to `mpi_wtime_'

/glade/scratch/fenghe/ifortrjZdEe.o: In function `mph_module_mp_mph_init_':

mph.f:(.text+0x72d6): undefined reference to `mpi_init_'

mph.f:(.text+0x72ec): undefined reference to `mpi_comm_dup_'

mph.f:(.text+0x7302): undefined reference to `mpi_comm_rank_'

mph.f:(.text+0x7318): undefined reference to `mpi_comm_size_'

mph.f:(.text+0x79f8): undefined reference to `mpi_type_struct_'

mph.f:(.text+0x7a09): undefined reference to `mpi_type_commit_'

Makefile:35: recipe for target 'mph.o' failed

 

gmake: *** [mph.o] Error 1

 

 

 

jedwards

Your build should be using MPIFC and MPICC to invoke the mpi libraries.   

xiaoyihf@...

Thanks, jedwards.

I want to give you some update on porting the CCSM3 code from yellowstone ( /glade/p/cesm/cseg/releases/ccsm3_0_1_beta33) to cheyenne.

In short, the good news is that I can get CCSM3 build on cheyenne, and the bad news is that when I submit the job to cheyenne, it hangs and I don't get any model output.

 

Here is how I get the CCSM3 code build on cheyenne:

I need to change the following three lines in models/bld/Macros.Linux

I made the 1st change because the CCSM3 configuration I use don't need both libraries. I made the 2nd and 3rd change because some lines of ocean and sea ice code is longer than 72 characters. 

27c27

< SLIBS      := -L$(LIB_NETCDF) -lnetcdf

> SLIBS      := -L$(LIB_NETCDF) -lnetcdf  -llapack -lblas

47c47

<    FIXEDFLAGS := -fixed -132

>    FIXEDFLAGS := -byteswapio

52c52

<    FIXEDFLAGS := -fixed -132

>    FIXEDFLAGS := -byteswapio

 

 

 Here is how I changed the CCSM3 batch run script for cheyenne:

 

On yellowstone, the code is submitted through a command file (poe.cmdfile) in the run script:

mpirun.lsf -cmdfile poe.cmdfile

On cheyenne, I followed the PBS Pro job script examples below on command file (cmdfile)

https://www2.cisl.ucar.edu/resources/computational-systems/cheyenne/runn...

I changed the yellowstone run script with the following two lines

setenv MPI_SHEPHERD true

mpiexec_mpt launch_cf.sh poe.cmdfile >&! ccsm3.log.$LID

   But when I submit the run script on cheyenne, I don't get any model output. It seems that I didn't add the command file in the run script correctly.   Will you please give me some suggestions on what I should change to get the CCSM3 job running on cheyenne? Thanks!

 

 

 

 

jedwards

First you do not need the MPI_SHEPHERD line, that is only used for running serial codes on cheyenne.

Second poe.cmdfile is a format specific to the old IBM poe environment, you'll have to find a pbs equivalent.  I have no idea what launch_cf.sh is but I bet it's also specific to yellowstone and will need to be reformated to work on cheyenne.

heavens

Hi,

I'm having the same problem figuring out how to run once I've built the CCSM3 executables. This isn't a trivial issue. launch_cf.sh is something mentioned in the Cheyenne documentation. It has nothing to do with Yellowstone. 

Nicholas Heavens

Research Assistant Professor of Planetary Science

Hampton University

 

heavens

I have made some progress by noticing that the settings for tempest are effectively SGI PBS settings using mpirun. I'm still not sure of the proper translation into mpiexec_mpt, though.

 

# -------------------------------------------------------------------------

# Create processor count input files

# -------------------------------------------------------------------------

 

cd $EXEROOT/all

@ PROC = 0      # counts total number of tasks

foreach n (1 2 3 4 5)

   set comp  = $COMPONENTS[$n]

   set model = $MODELS[$n]          

   set nthrd = $NTHRDS[$n]          

   set ntask = $NTASKS[$n]

   @ M = 0

   while ( $M < $ntask )

      @ M++  

      @ PROC++

   end

   ln -s $EXEROOT/$model/$comp  $EXEROOT/all/.  # link binaries into all dir

end

 

# -------------------------------------------------------------------------

# Run the model

# -------------------------------------------------------------------------

 

env | egrep '(MP_|LOADL|XLS|FPE|DSM|OMP|MPC)' # document env vars

 

cd $EXEROOT

echo "`date` -- CSM EXECUTION BEGINS HERE" 

mpirun -v -d $EXEROOT/all  \

   -np $NTASKS[1] "env OMP_NUM_THREADS=$NTHRDS[1] $COMPONENTS[1]" : \

   -np $NTASKS[2] "env OMP_NUM_THREADS=$NTHRDS[2] $COMPONENTS[2]" : \

   -np $NTASKS[3] "env OMP_NUM_THREADS=$NTHRDS[3] $COMPONENTS[3]" : \

   -np $NTASKS[4] "env OMP_NUM_THREADS=$NTHRDS[4] $COMPONENTS[4]" : \

   -np $NTASKS[5] "env OMP_NUM_THREADS=$NTHRDS[5] $COMPONENTS[5]"   &

wait

echo "`date` -- CSM EXECUTION HAS FINISHED" 

 
xiaoyihf@...

Dear CESM Software Working Group,

Due to the popularity and the large user base of CCSM3, I'm wondering whether the CESM Software Working Group can port the CCSM3 code from yellowstone to cheyenne for all the users, as the working group did during the transition from Bluefire to Yellowstone. The large CCSM3 user base will really appreciate this effort from the CESM Software Working Group. Otherwise, I assume many users will make the same efforts to try (and fail?) to port the code from yellowstone to cheyenne again and again for years to come. 

Thank you very much for the consideration of my suggestion. 

heavens

The most progress I have been able to make is to use something like this in the run script:

mpiexec_mpt -v \

   -np $NTASKS[1] omplace $EXEROOT/all/$COMPONENTS[1]  : \

   -np $NTASKS[2] omplace $EXEROOT/all/$COMPONENTS[2]  : \

   -np $NTASKS[3] omplace $EXEROOT/all/$COMPONENTS[3] : \

   -np $NTASKS[4] omplace $EXEROOT/all/$COMPONENTS[4] : \

   -np $NTASKS[5] omplace $EXEROOT/all/$COMPONENTS[5]   &

 

The challenge is that you may encounter the error, "MPT ERROR: could not run executable. If this is a non-MPT application,

you may need to set MPI_SHEPHERD=true."

 

This is deceptive. It is not caused here by "a bad node" as you may find by searching the forums here. The issue is that MCT does not recognize the various CCSM3 executables as valid MPI programs. I have found that this can be partly solved by ensuring that the code is compiled by the MCT versions of the MPI compilers, but I still end up with segmentation faults:

"MPT ERROR: Rank 0(g:0) received signal SIGSEGV(11).

Process ID: 25767, Host: r6i4n5, Program: /glade2/scratch2/heavens/Isabel1_mapgenerator/cpl/cpl

MPT Version: SGI MPT 2.15  12/18/16 02:58:06"

I'm trying to see if there are any useful hints in the tracebacks.

Nicholas HeavensResearch Assistant Professor of Planetary ScienceHampton University
heavens

Dear all,

I have managed to successfully port CCSM3 to cheyenne. That is, I have built and run my very kludgey deep time version of CCSM3 for five days. I give no guarantee that any of this is going to work for you. However, I'll give you the key tips.

1. Set up the simulation as a yellowstone experiment.

2. Bypass the Machine proxies in the Buildnml scripts with your own standard proxy file (attached) in the top level of the case directory.

3. Use the modules file attached. It is essential that CCSM3 executables are recognized as MPT applications. If they are not recognized as such, they will say so at run-time and give you a deceptive error message that you need to set MPT_SHEPHERD true. This is unnecessary.

4. Use something like the attached run script. The mpiexec_mpt call is especially critical and should give you guidance.

5. Copy the CCSM3 /models/utils onto your scratch. You enable its use by changing UTILROOT in env_run. You then need to edit mph.F such that you comment out "include mpif.h" and put "use mpi" BEFORE the implicit statement. Failure to do so will result in segmentation faults associated with the initiation of coupler communication (I'm attaching a typical example (Isabel1_mapgenerator.o5535144.txt ) as well as an example when things work as they should (Isabel1_mapgenerator.o5535470.txt ).)

At some point soon, I will need to work out the long-term archiving scripts and attempt a longer integration. I will update you with my progress.

 

Best regards to the CCSM3 community and the NCAR software engineers (who are right that I should have migrated this project to CESM two years ago, but I have my reasons...),

 

Nicholas Heavens

Research Assistant Professor of Planetary Science'

Hampton University

 

 

 

Log in or register to post comments

Who's new

  • zweina@...
  • yuan.liang@...
  • lian.xue@...
  • 353482168@...
  • 76414461@...