Main menu

Navigation

Fail to run default CESM1.2.2.1 compset=E_1850_CN with 2 nodes on Cheyenne

1 post / 0 new
tienyiah@...
Fail to run default CESM1.2.2.1 compset=E_1850_CN with 2 nodes on Cheyenne

I want to speed up CESM1.2.2.1 E_1850_CN compset on Cheyenne by using multiple nodes.

I create the case by


~/ucar_models/cesm1_2_2_1/scripts/create_newcase \
-case      test_64cores                  \
-compset   E_1850_CN                     \
-res       f45_g37                       \
-mach      cheyenne


In env_run.xml I set


STOP_N="12780"
DOCN_SOM_FILENAME="pop_frc.gx3v7.110128.nc"

In env_mach_pes.xml, I set

<?xml version="1.0"?>

<config_definition>

<entry id="NTASKS_ATM"   value="64"  />   
<entry id="NTHRDS_ATM"   value="1"  />   
<entry id="ROOTPE_ATM"   value="0"  />   
<entry id="NINST_ATM"   value="1"  />   
<entry id="NINST_ATM_LAYOUT"   value="concurrent"  />   

<entry id="NTASKS_LND"   value="64"  />   
<entry id="NTHRDS_LND"   value="1"  />   
<entry id="ROOTPE_LND"   value="0"  />   
<entry id="NINST_LND"   value="1"  />   
<entry id="NINST_LND_LAYOUT"   value="concurrent"  />   

<entry id="NTASKS_ICE"   value="64"  />   
<entry id="NTHRDS_ICE"   value="1"  />   
<entry id="ROOTPE_ICE"   value="0"  />   
<entry id="NINST_ICE"   value="1"  />   
<entry id="NINST_ICE_LAYOUT"   value="concurrent"  />   

<entry id="NTASKS_OCN"   value="64"  />   
<entry id="NTHRDS_OCN"   value="1"  />   
<entry id="ROOTPE_OCN"   value="0"  />   
<entry id="NINST_OCN"   value="1"  />   
<entry id="NINST_OCN_LAYOUT"   value="concurrent"  />   

<entry id="NTASKS_CPL"   value="64"  />   
<entry id="NTHRDS_CPL"   value="1"  />   
<entry id="ROOTPE_CPL"   value="0"  />   

<entry id="NTASKS_GLC"   value="64"  />   
<entry id="NTHRDS_GLC"   value="1"  />   
<entry id="ROOTPE_GLC"   value="0"  />   
<entry id="NINST_GLC"   value="1"  />   
<entry id="NINST_GLC_LAYOUT"   value="concurrent"  />   

<entry id="NTASKS_ROF"   value="64"  />   
<entry id="NTHRDS_ROF"   value="1"  />   
<entry id="ROOTPE_ROF"   value="0"  />   
<entry id="NINST_ROF"   value="1"  />   
<entry id="NINST_ROF_LAYOUT"   value="concurrent"  />   

<entry id="NTASKS_WAV"   value="64"  />   
<entry id="NTHRDS_WAV"   value="1"  />   
<entry id="ROOTPE_WAV"   value="0"  />   
<entry id="NINST_WAV"   value="1"  />   
<entry id="NINST_WAV_LAYOUT"   value="concurrent"  />   

<entry id="PSTRID_ATM"   value="1"  />   
<entry id="PSTRID_LND"   value="1"  />   
<entry id="PSTRID_ICE"   value="1"  />   
<entry id="PSTRID_OCN"   value="1"  />   
<entry id="PSTRID_CPL"   value="1"  />   
<entry id="PSTRID_GLC"   value="1"  />   
<entry id="PSTRID_ROF"   value="1"  />   
<entry id="PSTRID_WAV"   value="1"  />   

<entry id="TOTALPES"   value="64"  />   
<entry id="PES_LEVEL"   value="1r"  />   
<entry id="MAX_TASKS_PER_NODE"   value="36"  />   
<entry id="PES_PER_NODE"   value="36"  />   
<entry id="COST_PES"   value="0"  />   
<entry id="CCSM_PCOST"   value="0"  />   
<entry id="CCSM_TCOST"   value="0"  />   
<entry id="CCSM_ESTCOST"   value="3"  />   

</config_definition>

In test_64cores.run,

#!/bin/csh -f
###PBS -A
#PBS -N test_64cores
#PBS -q regular
#PBS -l select=2:ncpus=36:mpiprocs=36:ompthreads=1
#PBS -l walltime=08:00:00
#PBS -j oe
#PBS -S /bin/csh -V

However, the program freezes without any termination or further execution until it hits wall clock time. The last few lines of each component are

CESM

19: BalanceCheck: soil balance error nstep =    424571 point =  1109 imbalance =   -0.000002 W/m2
19: BalanceCheck: soil balance error nstep =    424572 point =  1109 imbalance =   -0.000002 W/m2
1: Opened file ./test_64cores.rtm.h0.0025-03.nc to write      458752
1: Opened file ./test_64cores.clm2.h0.0025-03.nc to write      458752
1: Opened file test_64cores.cam.h0.0025-03.nc to write      458752
2: BalanceCheck: soil balance error nstep =    424837 point =   141 imbalance =   -0.000000 W/m2
2: BalanceCheck: soil balance error nstep =    424838 point =   141 imbalance =   -0.000000 W/m2
3: BalanceCheck: soil balance error nstep =    424883 point =   204 imbalance =   -0.000000 W/m2
3: BalanceCheck: soil balance error nstep =    424884 point =   204 imbalance =   -0.000000 W/m2
53:  filew failed, worst i, j, qtmp, q =            1          30
53: -8.211380995416666E-009  1.859680624937745E-041
20: QNEG3 from TPHYSBCb:m=  3 lat/lchnk=    148 Min. mixing ratio violated at    2 points.  Reset to  0.0E+00 Worst =-1.2E-12 at i,k=  10 23
1: BalanceCheck: soil balance error nstep =    424935 point =    97 imbalance =   -0.000000 W/m2
1: BalanceCheck: soil balance error nstep =    424936 point =    97 imbalance =   -0.000000 W/m2
18: BalanceCheck: soil balance error nstep =    424959 point =  1057 imbalance =   -0.000002 W/m2
18: BalanceCheck: soil balance error nstep =    424960 point =  1057 imbalance =   -0.000002 W/m2
20: BalanceCheck: soil balance error nstep =    425003 point =  1159 imbalance =   -0.000001 W/m2
19: BalanceCheck: soil balance error nstep =    425003 point =  1110 imbalance =   -0.000000 W/m2
19: BalanceCheck: soil balance error nstep =    425004 point =  1110 imbalance =   -0.000000 W/m2
20: BalanceCheck: soil balance error nstep =    425004 point =  1159 imbalance =   -0.000001 W/m2
19: BalanceCheck: soil balance error nstep =    425095 point =  1109 imbalance =   -0.000001 W/m2
19: BalanceCheck: soil balance error nstep =    425096 point =  1109 imbalance =   -0.000001 W/m2
38: BalanceCheck: soil balance error nstep =    425101 point =  2236 imbalance =   -0.000001 W/m2
38: BalanceCheck: soil balance error nstep =    425102 point =  2236 imbalance =   -0.000001 W/m2
53:  filew failed, worst i, j, qtmp, q =            1          30
53: -1.490387661098496E-016  7.093350521850853E-047
31: BalanceCheck: soil balance error nstep =    425131 point =  1865 imbalance =   -0.000001 W/m2
31: BalanceCheck: soil balance error nstep =    425132 point =  1865 imbalance =   -0.000001 W/m2
60: BalanceCheck: soil balance error nstep =    425145 point =  3617 imbalance =   -0.000003 W/m2
60: BalanceCheck: soil balance error nstep =    425146 point =  3617 imbalance =   -0.000003 W/m2
23: BalanceCheck: soil balance error nstep =    425169 point =  1357 imbalance =   -0.000004 W/m2
23: BalanceCheck: soil balance error nstep =    425170 point =  1357 imbalance =   -0.000004 W/m2
6: BalanceCheck: soil balance error nstep =    425189 point =   359 imbalance =   -0.000001 W/m2
6: BalanceCheck: soil balance error nstep =    425190 point =   359 imbalance =   -0.000001 W/m2
51: BalanceCheck: soil balance error nstep =    425283 point =  3049 imbalance =   -0.000001 W/m2
51: BalanceCheck: soil balance error nstep =    425284 point =  3049 imbalance =   -0.000001 W/m2

CPL

 
(seq_diag_print_mct) NET AREA BUDGET (m2/m2): period =  monthly: date =    250401     0
                       atm            lnd            ocn         ice nh         ice sh        *SUM*  
        area    -1.00000000     0.29324025     0.64718132     0.04019648     0.01938454     0.00000258
  
(seq_diag_print_mct) NET HEAT BUDGET (W/m2): period =  monthly: date =    250401     0
                       atm            lnd            rof            ocn         ice nh         ice sh        *SUM*  
     hfreeze     0.00000000     0.00000000     0.00000000     0.07910454    -0.02622360    -0.05286458     0.00001636
       hmelt     0.00000000     0.00000000     0.00000000    -0.74348212     0.42499467     0.31831747    -0.00016998
      hnetsw  -163.05330953    38.32536366     0.00000000   123.34557208     0.88798738     0.48733692    -0.00704949
       hlwdn  -322.74136669    81.85235878     0.00000000   230.18081722     6.71059416     3.99699017    -0.00060636
       hlwup   382.22905060  -100.23357861     0.00000000  -268.95603266    -8.26359495    -4.77674509    -0.00090071
     hlatvap    77.82491763   -11.88317259     0.00000000   -65.87975725    -0.04217840    -0.02029389    -0.00048449
     hlatfus     0.79383088    -0.34822528     0.00000000    -0.26460311    -0.10267895    -0.07831654     0.00000700
      hiroff     0.00000000     0.03837988     0.00003043     0.00000000     0.00000000     0.00000000     0.03841031
        hsen    18.41571865    -7.80060996     0.00000000   -10.74228657     0.15480912    -0.02895466    -0.00132343
       *SUM*    -6.53115846    -0.04948413     0.00003043     7.01933214    -0.25629057    -0.15453020     0.02789921
  
(seq_diag_print_mct) NET WATER BUDGET (kg/m2s*1e6): period =  monthly: date =    250401     0
                       atm            lnd            rof            ocn         ice nh         ice sh        *SUM*  
     wfreeze     0.00000000     0.00000000     0.00000000    -0.20972689     0.06952565     0.14015786    -0.00004338
       wmelt     0.00000000     0.00000000     0.00000000    -0.28424494     0.50163837    -0.21658632     0.00080711
       wrain   -28.88472229     6.75483040     0.00000000    22.03273135     0.06024532     0.03685768    -0.00005755
       wsnow    -2.37887588     1.04352795     0.00000000     0.79293710     0.30769837     0.23469147    -0.00002099
       wevap    31.10445695    -4.74133316     0.00000000   -26.34136635    -0.01483905    -0.00711210    -0.00019372
     wrunoff     0.00000000    -2.80489561     0.12314988     0.00000000     0.00000000     0.00000000    -2.68174572
     wfrzrof     0.00000000    -0.11501312    -0.00009119     0.00000000     0.00000000     0.00000000    -0.11510431
       *SUM*    -0.15914122     0.13711647     0.12305869    -4.00966972     0.92426866     0.18800857    -2.79635855
  
 tStamp_write: model date =   250401       0 wall clock = 2019-04-18 01:38:04 avg dt =     2.38 dt =     2.84
 memory_write: model date =   250401       0 memory =     146.74 MB (highwater)         -0.00 MB (usage)  (pe=    0 comps= cpl ATM LND OCN ICE GLC ROF WAV)
 tStamp_write: model date =   250402       0 wall clock = 2019-04-18 01:38:06 avg dt =     2.38 dt =     2.38
 memory_write: model date =   250402       0 memory =     146.74 MB (highwater)         -0.00 MB (usage)  (pe=    0 comps= cpl ATM LND OCN ICE GLC ROF WAV)
 tStamp_write: model date =   250403       0 wall clock = 2019-04-18 01:38:09 avg dt =     2.38 dt =     2.37
 memory_write: model date =   250403       0 memory =     146.74 MB (highwater)         -0.00 MB (usage)  (pe=    0 comps= cpl ATM LND OCN ICE GLC ROF WAV)
 tStamp_write: model date =   250404       0 wall clock = 2019-04-18 01:38:11 avg dt =     2.38 dt =     2.39
 memory_write: model date =   250404       0 memory =     146.74 MB (highwater)         -0.00 MB (usage)  (pe=    0 comps= cpl ATM LND OCN ICE GLC ROF WAV)
 tStamp_write: model date =   250405       0 wall clock = 2019-04-18 01:38:13 avg dt =     2.38 dt =     2.36
 memory_write: model date =   250405       0 memory =     146.74 MB (highwater)         -0.00 MB (usage)  (pe=    0 comps= cpl ATM LND OCN ICE GLC ROF WAV)
 tStamp_write: model date =   250406       0 wall clock = 2019-04-18 01:38:16 avg dt =     2.38 dt =     2.43
 memory_write: model date =   250406       0 memory =     146.74 MB (highwater)         -0.00 MB (usage)  (pe=    0 comps= cpl ATM LND OCN ICE GLC ROF WAV)
 tStamp_write: model date =   250407       0 wall clock = 2019-04-18 01:38:18 avg dt =     2.38 dt =     2.40
 memory_write: model date =   250407       0 memory =     146.74 MB (highwater)         -0.00 MB (usage)  (pe=    0 comps= cpl ATM LND OCN ICE GLC ROF WAV)
 tStamp_write: model date =   250408       0 wall clock = 2019-04-18 01:38:21 avg dt =     2.38 dt =     2.39
 memory_write: model date =   250408       0 memory =     146.74 MB (highwater)         -0.00 MB (usage)  (pe=    0 comps= cpl ATM LND OCN ICE GLC ROF WAV)
 tStamp_write: model date =   250409       0 wall clock = 2019-04-18 01:38:23 avg dt =     2.38 dt =     2.37
 memory_write: model date =   250409       0 memory =     146.74 MB (highwater)         -0.00 MB (usage)  (pe=    0 comps= cpl ATM LND OCN ICE GLC ROF WAV)
 tStamp_write: model date =   250410       0 wall clock = 2019-04-18 01:38:25 avg dt =     2.38 dt =     2.38
 memory_write: model date =   250410       0 memory =     146.74 MB (highwater)         -0.00 MB (usage)  (pe=    0 comps= cpl ATM LND OCN ICE GLC ROF WAV)
 tStamp_write: model date =   250411       0 wall clock = 2019-04-18 01:38:28 avg dt =     2.38 dt =     2.36
 memory_write: model date =   250411       0 memory =     146.74 MB (highwater)         -0.00 MB (usage)  (pe=    0 comps= cpl ATM LND OCN ICE GLC ROF WAV)
 tStamp_write: model date =   250412       0 wall clock = 2019-04-18 01:38:30 avg dt =     2.38 dt =     2.36
 memory_write: model date =   250412       0 memory =     146.74 MB (highwater)         -0.00 MB (usage)  (pe=    0 comps= cpl ATM LND OCN ICE GLC ROF WAV)

ATM

 nstep, te   425343   0.32921039180626612E+10   0.32921041332761240E+10   0.11929028825786629E-04   0.98505215837519107E+05
 nstep, te   425344   0.32921264972353024E+10   0.32921250316163564E+10  -0.81237513104121090E-04   0.98505239159271499E+05
 nstep, te   425345   0.32921471968084836E+10   0.32921456711939802E+10  -0.84562986199844059E-04   0.98505249864586105E+05
 nstep, te   425346   0.32921716055284190E+10   0.32921694904318500E+10  -0.11723724872707561E-03   0.98505269089986512E+05
 nstep, te   425347   0.32921957574583359E+10   0.32921931562663460E+10  -0.14418092821176325E-03   0.98505270934110289E+05
 nstep, te   425348   0.32922196046240435E+10   0.32922173836605086E+10  -0.12310530945831881E-03   0.98505282935014067E+05
 nstep, te   425349   0.32922440268539519E+10   0.32922415843710651E+10  -0.13538384641695618E-03   0.98505291279411394E+05
 nstep, te   425350   0.32922652208534374E+10   0.32922635071687074E+10  -0.94987447426059490E-04   0.98505297846811067E+05
 nstep, te   425351   0.32922871265105324E+10   0.32922853150614657E+10  -0.10040639742915476E-03   0.98505305220259645E+05
  
      120000      250412
 Total Mass=   985.053077818377      (mb), Dry Mass=   982.880000093488      (mb
 )
 Total Precipitable Water =   22.1603331466039      (kg/m**2)
 PS max =    1037.62475230829       min =    567.014606950163     
 U  max =    66.1332969844372       min =   -47.1004067702001     
 V  max =    33.7656336834553       min =   -34.9560846633157     
 T  max =    310.242788710309       min =    183.428902372904     
 W (mb/day) max =    380.038874494323       min =   -407.924193252446     
 Average Height (geopotential units) =    582.988339352997     
 PRECC max =    41.2052734358385       min =   0.000000000000000E+000
 PRECL max =    22.8397352784598       min =   0.000000000000000E+000
 Total precp=   2.73879696830441       CON=   2.06939326576035       LS=
  0.669403702544058     
  
 nstep, te   425352   0.32923056843956847E+10   0.32923045000447469E+10  -0.65647116472378241E-04   0.98505307781837677E+05
 nstep, te   425353   0.32923244978453741E+10   0.32923233300704021E+10  -0.64728328066199621E-04   0.98505312330263710E+05
 nstep, te   425354   0.32923392900191512E+10   0.32923387833463516E+10  -0.28084249955686805E-04   0.98505309301401896E+05
 nstep, te   425355   0.32923543066691456E+10   0.32923538723877988E+10  -0.24071680734112838E-04   0.98505309512877124E+05
 nstep, te   425356   0.32923643985989852E+10   0.32923649802449522E+10   0.32239920973220952E-04   0.98505307527387660E+05
 nstep, te   425357   0.32923754230899429E+10   0.32923760318558717E+10   0.33743144592790234E-04   0.98505314875918775E+05
 nstep, te   425358   0.32923790140984435E+10   0.32923805358942785E+10   0.84351278844188780E-04   0.98505300128855233E+05
 nstep, te   425359   0.32923836379697976E+10   0.32923848794635139E+10   0.68814485489876364E-04   0.98505286270728568E+05

OCN

(docn_comp_run) ocn: model date   250412       0s
(docn_comp_run) ocn: model date   250412    1800s
(docn_comp_run) ocn: model date   250412    3600s
(docn_comp_run) ocn: model date   250412    5400s
(docn_comp_run) ocn: model date   250412    7200s
(docn_comp_run) ocn: model date   250412    9000s
(docn_comp_run) ocn: model date   250412   10800s
(docn_comp_run) ocn: model date   250412   12600s
(docn_comp_run) ocn: model date   250412   14400s
(docn_comp_run) ocn: model date   250412   16200s
(docn_comp_run) ocn: model date   250412   18000s
(docn_comp_run) ocn: model date   250412   19800s
(docn_comp_run) ocn: model date   250412   21600s
(docn_comp_run) ocn: model date   250412   23400s
(docn_comp_run) ocn: model date   250412   25200s
(docn_comp_run) ocn: model date   250412   27000s
(docn_comp_run) ocn: model date   250412   28800s
(docn_comp_run) ocn: model date   250412   30600s
(docn_comp_run) ocn: model date   250412   32400s
(docn_comp_run) ocn: model date   250412   34200s
(docn_comp_run) ocn: model date   250412   36000s
(docn_comp_run) ocn: model date   250412   37800s
(docn_comp_run) ocn: model date   250412   39600s
(docn_comp_run) ocn: model date   250412   41400s
(docn_comp_run) ocn: model date   250412   43200s
(docn_comp_run) ocn: model date   250412   45000s
(docn_comp_run) ocn: model date   250412   46800s
(docn_comp_run) ocn: model date   250412   48600s
(docn_comp_run) ocn: model date   250412   50400s
(docn_comp_run) ocn: model date   250412   52200s
(docn_comp_run) ocn: model date   250412   54000s

LND

 clm2: completed timestep       425329
 clm2: completed timestep       425330
 clm2: completed timestep       425331
 clm2: completed timestep       425332
 clm2: completed timestep       425333
 clm2: completed timestep       425334
 clm2: completed timestep       425335
 clm2: completed timestep       425336
 clm2: completed timestep       425337
 clm2: completed timestep       425338
 clm2: completed timestep       425339
 clm2: completed timestep       425340
 clm2: completed timestep       425341
 clm2: completed timestep       425342
 clm2: completed timestep       425343
 clm2: completed timestep       425344
 clm2: completed timestep       425345
 clm2: completed timestep       425346
 clm2: completed timestep       425347
 clm2: completed timestep       425348
 clm2: completed timestep       425349
 clm2: completed timestep       425350
 clm2: completed timestep       425351
 clm2: completed timestep       425352
 clm2: completed timestep       425353
 clm2: completed timestep       425354
 clm2: completed timestep       425355
 clm2: completed timestep       425356
 clm2: completed timestep       425357
 clm2: completed timestep       425358

ICE

 aero:            3  faero-fsoot   :    964238.712753577     
   6965.40539540224     
 aero:            3  aerotot       :    11632605283.1229     
   154355202.439234     
 aero:            3  aerotot change:    964238.703186035     
   6965.40536493063     
 aero:            3  aeromax agg:   2.635567523154776E-003
  7.296360311253620E-005
                                             Arctic                 Antarctic
total ice area  (km^2) =    2.00945086160375476E+07   1.35309127651979420E+07
total ice extent(km^2) =    2.20291190032451749E+07   1.47384636229330283E+07
total ice volume (m^3) =    6.88024434738068750E+13   2.34776381742008086E+13
total snw volume (m^3) =    6.66084391472641406E+12   5.72946900102083301E+12
tot kinetic energy (J) =    1.74521771242845000E+14   3.04268053722771375E+14
rms ice speed    (m/s) =        0.07311672633334403       0.16119528570787445
average albedo         =        0.72837027208873206       0.72235144764844084
max ice volume     (m) =       11.36771336308637892       8.15412867414148046
max ice speed    (m/s) =        0.37128519369046487       0.33065583035276547
max strength    (kN/m) =      918.44658971979947637     176.50532225058483959
 ----------------------------
arwt rain h2o kg in dt =    2.52054563698879585E+10   3.20214785572211838E+10
arwt snow h2o kg in dt =    2.80552510902386841E+11   3.04370879343336914E+11
arwt evap h2o kg in dt =   -4.19320814133738632E+10   3.49570613782832861E+09
arwt frzl h2o kg in dt =    1.43208999101032887E+10   2.21800242156592621E+11
arwt frsh h2o kg in dt =   -9.30511758540998383E+10  -1.14381027288429443E+12
arwt ice mass (kg)     =    6.30918406654809040E+16   2.15289942057421400E+16
arwt snw mass (kg)     =    2.19807849185971675E+15   1.89072477033687500E+15
arwt tot mass (kg)     =    6.52899191573406240E+16   2.34197189760790160E+16
arwt tot mass chng(kg) =    3.71197961616000000E+11   1.70549857907200000E+12
arwt water flux        =    3.71197961623104004E+11   1.70549857907927344E+12
 (=rain+snow+evap+frzl-fresh)  
water flux error       =    1.08807056249070093E-16   3.10568948646613347E-16
 ----------------------------
arwt atm heat flux (W) =   -1.61052913247670625E+14  -4.09693210590881625E+14
arwt ocn heat flux (W) =   -1.86565098595157125E+14  -1.04561653219656187E+14
arwt frzl heat flux(W) =    2.65493572222303711E+12   4.11193004486971953E+13
arwt tot energy    (J) =   -2.27898341317840061E+22  -7.70622817393212444E+21
arwt net heat      (J) =    4.11430493254742320E+16  -6.23251544075860736E+17
arwt tot energy chng(J)=    4.11458421611560960E+16  -6.23250774161358848E+17
arwt heat error        =    1.22547433461525362E-10   9.99080853188842203E-11
 ----------------------------
arwt salt mass (kg)    =    2.52367362661923625E+14   8.61159768229685625E+13
arwt salt mass chng(kg)=    9.59008541255586863E+08   5.35289761172562981E+09
arwt salt flx in dt(kg)=   -9.59008541289906263E+08  -5.35289761175689507E+09
arwt salt flx error    =   -1.35989853938951434E-16  -3.63059909932104165E-16
 ----------------------------

ROF

 (Rtmrun) model date is      250326           0
  
 (Rtmrun) model date is      250327           0
  
 (Rtmrun) model date is      250328           0
  
 (Rtmrun) model date is      250329           0
  
 (Rtmrun) model date is      250330           0
  
 (Rtmrun) model date is      250331           0
  
 (Rtmrun) model date is      250401           0
 hist_htapes_wrapup : Creating history file ./test_64cores.rtm.h0.0025-03.nc
  at nstep =        70800
 calling htape_create for file t =            1
 htape_create : Opening netcdf htape ./test_64cores.rtm.h0.0025-03.nc
 htape_create : Successfully defined netcdf history file            1
 
 hist_htapes_wrapup : Writing current time sample to local history file 
 ./test_64cores.rtm.h0.0025-03.nc at nstep =        70800 
  for history time interval beginning at    8819.00000000000       and ending at
     8850.00000000000     
 
 
 hist_htapes_wrapup : Closing local history file 
 ./test_64cores.rtm.h0.0025-03.nc at nstep =        70800
 
  
 (Rtmrun) model date is      250402           0
  
 (Rtmrun) model date is      250403           0
  
 (Rtmrun) model date is      250404           0
  
 (Rtmrun) model date is      250405           0
  
 (Rtmrun) model date is      250406           0
  
 (Rtmrun) model date is      250407           0
  
 (Rtmrun) model date is      250408           0
  
 (Rtmrun) model date is      250409           0
  
 (Rtmrun) model date is      250410           0
  
 (Rtmrun) model date is      250411           0
  
 (Rtmrun) model date is      250412           0

 

##################

Again: The program freezes without any termination or further execution until it hits wall clock time. I am now using 2 nodes. In env_mach_pes.xml I also tried 72 and 68 cores and none of them work.

Does anyone know if I am doing things correctly?

Who's new

  • afroberts@...
  • richard.allard@...
  • tracy@...
  • 498749303@...
  • lijundesio@...