This site is migrating to a new forum software on Tuesday, September 24th 2019, you may experience a short downtime during this transition

Main menu

Navigation

CESM won't run on Ranger with nonzero rootpe

4 posts / 0 new
Last post
bitz@...
CESM won't run on Ranger with nonzero rootpe

I am able to run CESM (1deg CCSM4 physics with cesm1_0 code base) on Ranger with 128 and 256 processors provided the rootpe's are all zero. When I try to have the ocean run all the time (ROOTPE_OCN=448), the run gets to the ocn initialization and dies without a word in any of the log files. The environmental variables are all set correctly in the run script, so it produces the correct summary (see below). Happy to provide more info, if you tell me what is useful.

Here is some stuff I thought might be relevant

Ranger has opteron chips and I am using pgi and mvapich:
/opt/apps/pgi7_2/mvapich/1.0.1/bin/mpif90

The run command is "ibrun ./ccsm.exe"

/share/sge6.2/default/pe_scripts/getmode.sh
mvapich1_ssh

Do I need MPI-2?

set MODELS = ( cpl atm lnd ice ocn glc )
set COMPONENTS = ( cpl cam clm cice pop2 sglc )
set NTASKS = ( 320 448 128 320 64 1 )
set NTHRDS = ( 1 1 1 1 1 1 )
set ROOTPE = ( 0 0 320 0 448 0 )
set PSTRID = ( 1 1 1 1 1 1 )

Cecilia Bitz, Atmospheric Sciences, University of Washington

eaton

You don't need MPI-2.

The pe layout that you describe looks like it's designed for 512 processors. But you only stated that you could run with 128 and 256 procs with all the root pes set to zero. Can you also run with 512 procs and all the root pes set to 0? Can you run with a pe layout appropriate for 256 procs and run the ocn concurrently with atm/lnd/ice ?

bitz@...

Thanks. I'll try your suggestion, but it will take several days because the queue waits on Ranger are so long.

Cecilia Bitz, Atmospheric Sciences, University of Washington

bitz@...

I have confirmed that I can run with 512 processors with all rootpe=0. I cannot run with nonzero rootpe with either 256 or 512 processors. Finally, it is true that MPI2 did not help. In all cases with nonzero a rootpe for the ocn, the model fails just after the ocn.log says "Initializing diagnostic BSF variables ..." and the ccsm.log says "(init_tavg) tavg_streams ..." Then the ccsm.log has "MPI process terminated unexpectedly". I didn't add any statements to flush the output buffer, so maybe this is a red herring.

I'm not going to pursue this further. I'm satisfied to just be able to run at this point.

Cecilia Bitz, Atmospheric Sciences, University of Washington

Log in or register to post comments

Who's new

  • jwolff
  • tinna.gunnarsdo...
  • sarthak2235@...
  • eolivares@...
  • shubham.gandhi@...