Main menu

Navigation

Need help about Running ccsm on Linux Xeon Cluster

2 posts / 0 new
Last post
Anonymous
Need help about Running ccsm on Linux Xeon Cluster

- the machine you are running on (eg. name, cpu, etc.)
the machine, I select as Jazz according CCSM,
Xeon 2.8G* 2 , 16nodes
- the mpich and network type (eg. mpich-1.2.6 or other, ethernet/myrinet/other)
mpich-1.2.5, myrinet
- the compiler and version (eg. pgi 5.1-3 or other)
pgi 5.1-6
- the CCSM version (eg. release ccsm3.0)
ccsm3.0
- my mail: eval(unescape('%64%6f%63%75%6d%65%6e%74%2e%77%72%69%74%65%28%27%3c%61%20%68%72%65%66%3d%22%6d%61%69%6c%74%6f%3a%6e%6f%74%68%69%6e%67%74%6f%6c%6f%73%65%5f%39%39%39%40%68%6f%74%6d%61%69%6c%2e%63%6f%6d%22%20%63%6c%61%73%73%3d%22%62%62%2d%65%6d%61%69%6c%22%3e%6e%6f%74%68%69%6e%67%74%6f%6c%6f%73%65%5f%39%39%39%40%68%6f%74%6d%61%69%6c%2e%63%6f%6d%3c%2f%61%3e%27%29%3b')), I am glad to recieve any suggestion.

The Detail message :

machine:Jazz comptest B.
without PBS just wrote mpirun.file:
mnode1 0
mnode1 1 /user/TestB/all/cpl
mnode2 1 /user/TestB/all/csim
mnode2 1 /user/TestB/all/csim
.....

Print out of TestB.jazz.run :
> .....
> ... ...
> ... ...
>
> (cpl_control_readNList) ------------------------------------------------------------
> (cpl_control_readNList) orbit based on orb_year = 1990
> (shr_orb_params) Calculate characteristics of the orbit:
> (shr_orb_params) CVS revision: $Revision: 1.2 $
> (shr_orb_params) CVS Tag : $Name: ccsm3_0_rel04 $
> (shr_orb_params) Calculate orbit for year: 1990
> (shr_orb_params) ------ Computed Orbital Parameters ------
> (shr_orb_params) Eccentricity = 1.670772E-02
> (shr_orb_params) Obliquity (deg) = 2.344107E+01
> (shr_orb_params) Obliquity (rad) = 4.091238E-01
> (shr_orb_params) Long of perh(deg) = 1.027242E+02
> (shr_orb_params) Long of perh(rad) = 4.934468E+00
> (shr_orb_params) Long at v.e.(rad) = -3.250364E-02
> (shr_orb_params) -----------------------------------------
> (main) -------------------------------------------------------------------------
> (main) get simulation start date
> (main) -------------------------------------------------------------------------
> (restart_readDate) restart type = initial => start date specified by input namelist
> (main) simulation start date is 19890101
> (main) -------------------------------------------------------------------------
> (main) contract init: establishes domains & routers (excluding lnd)
> (main) -------------------------------------------------------------------------
> (cpl_contract_init) cpl-
#####Then the program went out.

#####The content of Cpl.log......:
> (shr_orb_params) -----------------------------------------
> (cpl_control_readNList) ------------------------------------------------------------
> (cpl_control_readNList) orbit based on orb_year = 1990
> (shr_orb_params) Calculate characteristics of the orbit:
> (shr_orb_params) CVS revision: $Revision: 1.2 $
> (shr_orb_params) CVS Tag : $Name: ccsm3_0_rel04 $
> (shr_orb_params) Calculate orbit for year: 1990
> (shr_orb_params) ------ Computed Orbital Parameters ------
> (shr_orb_params) Eccentricity = 1.670772E-02
> (shr_orb_params) Obliquity (deg) = 2.344107E+01
> (shr_orb_params) Obliquity (rad) = 4.091238E-01
> (shr_orb_params) Long of perh(deg) = 1.027242E+02
> (shr_orb_params) Long of perh(rad) = 4.934468E+00
> (shr_orb_params) Long at v.e.(rad) = -3.250364E-02
> (shr_orb_params) -----------------------------------------
> (main) -------------------------------------------------------------------------
> (main) get simulation start date
> (main) -------------------------------------------------------------------------
> (restart_readDate) restart type = initial => start date specified by input namelist
> (main) simulation start date is 19890101
> (main) -------------------------------------------------------------------------
> (main) contract init: establishes domains & routers (excluding lnd)
> (main) -------------------------------------------------------------------------
> (cpl_contract_init) cpl-recv-atm
> [22] MPI Abort by user Aborting program !
> [22] Aborting program!
> [23] MPI Abort by user Aborting program !
> [23] Aborting program!
> [20] MPI Abort by user Aborting program !
> [20] Aborting program!
> [26] MPI Abort by user Aborting program !
> [26] Aborting program!
> [25] MPI Abor t by user Aborting program !
> [25] Aborting program!
> Sun Mar 27 17:24:19 CST 2005 -- CSM EXECUTION HAS FINISHED

> Model did not complete - see cpl.log.
####Then I found the size of ice.stdin and cpl.stdin is 0.

#### would you please tell me what I should do? Thank you very much!
### the info below is the tail of every run log file.
==> ./cpl/cpl.log.050328-215150 <==
DECOMP_OI = 1,
DECOMP_R = 1,
BFBFLAG = F
/
(cpl_control_readNList) ------------------------------------------------------------
(cpl_control_readNList) ------------------------------------------------------------
(cpl_control_readNList) Namelist values AFTER reading file...
&INPARM
CASE_NAME = TestB ,
CASE_DESC = TestB TestB
==> ./atm/atm.log.050328-215150 <==
Filename specifier for tape 4 = %c.cam2.h%t.%y-%m-%d-%s.nc
Filename specifier for tape 5 = %c.cam2.h%t.%y-%m-%d-%s.nc
Filename specifier for tape 6 = %c.cam2.h%t.%y-%m-%d-%s.nc
AEROSOL_SETOPTS: prognostic sulfur aerosols are off
AEROSOL_SETOPTS: feedback of prognostic sulfur aerosols is disabled
AEROSOL_SETOPTS: prognostic carbon aerosols are off
AEROSOL_SETOPTS: feedback of prognostic carbon aerosols is disabled
AEROSOL_SETOPTS: prognostic sea salt aerosols are off
AEROSOL_SETOPTS: feedback of prognostic sea salt aerosols is disabled
ENDRUN:INIT_FILEPATHS: Cannot find LOGNAME environment variable

==> ./lnd/lnd.log.050328-215150 <==
CLM MODEL version 2.1

Attempting to initialize the land model .....
Preset Fortran unit numbers:
unit 5 = standard input
unit 6 = standard output

Attempting to initialize run control settings .....
error: logname not defined
ENDRUN: called without a message string

==> ./ocn/ocn.log.050328-215150 <==
************************************************************************
------------------------------------------------------------------------

Problem and domain sizes

------------------------------------------------------------------------
Global problem size: 320 x 384 x 40
Using 10 processors in a 5 x 2 Cartesian decomposition
Local array size is: 68 x 196
Physical domain is (approximately): 64 x 192

==> ./ice/ice.log.050328-215150 <==
Albedo, albsnowi: 0.6800000000000001

read_global 11 1 -1.377987973285832
1.570400871576175
read_global 11 2 1.2689836592016945E-004
6.283119165197860
read_global 11 3 496577.9790663405
12507466.10015631
read_global 11 4 2860216.231103214
7213601.
###

gcarr@...

Hmmmm. I'm wondering if you are having a problem with the mpich or configuration of mpich you are running. Do the component log files give additional information that the batch output does not?

If you are using "mpich-1.2.5", then you may really be running across your ethernet and not the Myrinet. The modifications needed to run CCSM3 over an ethernet are not yet in a release baseline. If you really meant "mpich-gm 1.2.5" (I have used mpich-gm 1.2.5..10 and mpich-gm 1.2.5..12) then I wonder if it is incorrectly configured for ccsm.

Following is cut/paste from private email:

In version mpich-1.2.5..12 they added --enable-sharedlib to their mpich configure line and I had been using their suggested build for this version. It works ok for cam but not ccsm. I just confirmed removing --enable-sharedlib makes it work for ccsm.

If you use mpich-gm 1.2.5..10 then the default configuration parameters should work. With 1.2.5..12 you need to change the default per above.

Your cpl log file shows
Cannot find LOGNAME environment variable

$LOGNAME is an environment variable that the scripts use to identify a few things including the directory in which the exeecution (EXEROOT) should take place. Most systems (Linux, IBM, others) provide this. Type "echo $LOGNAME" to test your environment. If your system does not have this, then you will need to modify the CCSM scripts to use something else. If the scripts cannot get to this directory, then the scripts will fail.

Thanks for using the bulletin board.[/i]

George R Carr Jr
NCAR/CGD
eval(unescape('%64%6f%63%75%6d%65%6e%74%2e%77%72%69%74%65%28%27%3c%61%20%68%72%65%66%3d%22%6d%61%69%6c%74%6f%3a%67%63%61%72%72%40%75%63%61%72%2e%65%64%75%22%20%63%6c%61%73%73%3d%22%62%62%2d%65%6d%61%69%6c%22%3e%67%63%61%72%72%40%75%63%61%72%2e%65%64%75%3c%2f%61%3e%27%29%3b'))

Log in or register to post comments

Who's new

  • jwolff
  • tinna.gunnarsdo...
  • sarthak2235@...
  • eolivares@...
  • shubham.gandhi@...