tienyiah@uci_edu
New Member
I want to speed up CESM1.2.2.1 E_1850_CN compset on Cheyenne by using multiple nodes.I create the case by
In env_run.xml I set
In env_mach_pes.xml, I set
In test_64cores.run,
However, the program freezes without any termination or further execution until it hits wall clock time. The last few lines of each component areCESM
CPL
ATM
OCN
LND
ICE
ROF
##################Again: The program freezes without any termination or further execution until it hits wall clock time. I am now using 2 nodes. In env_mach_pes.xml I also tried 72 and 68 cores and none of them work. Does anyone know if I am doing things correctly?
Code:
~/ucar_models/cesm1_2_2_1/scripts/create_newcase
-case test_64cores
-compset E_1850_CN
-res f45_g37
-mach cheyenne
<br /><br />
Code:
STOP_N="12780"
DOCN_SOM_FILENAME="pop_frc.gx3v7.110128.nc"<br />
In test_64cores.run,
Code:
#!/bin/csh -f
###PBS -A
#PBS -N test_64cores
#PBS -q regular
#PBS -l select=2:ncpus=36:mpiprocs=36:ompthreads=1
#PBS -l walltime=08:00:00
#PBS -j oe
#PBS -S /bin/csh -V
Code:
19: BalanceCheck: soil balance error nstep = 424571 point = 1109 imbalance = -0.000002 W/m2
19: BalanceCheck: soil balance error nstep = 424572 point = 1109 imbalance = -0.000002 W/m2
1: Opened file ./test_64cores.rtm.h0.0025-03.nc to write 458752
1: Opened file ./test_64cores.clm2.h0.0025-03.nc to write 458752
1: Opened file test_64cores.cam.h0.0025-03.nc to write 458752
2: BalanceCheck: soil balance error nstep = 424837 point = 141 imbalance = -0.000000 W/m2
2: BalanceCheck: soil balance error nstep = 424838 point = 141 imbalance = -0.000000 W/m2
3: BalanceCheck: soil balance error nstep = 424883 point = 204 imbalance = -0.000000 W/m2
3: BalanceCheck: soil balance error nstep = 424884 point = 204 imbalance = -0.000000 W/m2
53: filew failed, worst i, j, qtmp, q = 1 30
53: -8.211380995416666E-009 1.859680624937745E-041
20: QNEG3 from TPHYSBCb:m= 3 lat/lchnk= 148 Min. mixing ratio violated at 2 points. Reset to 0.0E+00 Worst =-1.2E-12 at i,k= 10 23
1: BalanceCheck: soil balance error nstep = 424935 point = 97 imbalance = -0.000000 W/m2
1: BalanceCheck: soil balance error nstep = 424936 point = 97 imbalance = -0.000000 W/m2
18: BalanceCheck: soil balance error nstep = 424959 point = 1057 imbalance = -0.000002 W/m2
18: BalanceCheck: soil balance error nstep = 424960 point = 1057 imbalance = -0.000002 W/m2
20: BalanceCheck: soil balance error nstep = 425003 point = 1159 imbalance = -0.000001 W/m2
19: BalanceCheck: soil balance error nstep = 425003 point = 1110 imbalance = -0.000000 W/m2
19: BalanceCheck: soil balance error nstep = 425004 point = 1110 imbalance = -0.000000 W/m2
20: BalanceCheck: soil balance error nstep = 425004 point = 1159 imbalance = -0.000001 W/m2
19: BalanceCheck: soil balance error nstep = 425095 point = 1109 imbalance = -0.000001 W/m2
19: BalanceCheck: soil balance error nstep = 425096 point = 1109 imbalance = -0.000001 W/m2
38: BalanceCheck: soil balance error nstep = 425101 point = 2236 imbalance = -0.000001 W/m2
38: BalanceCheck: soil balance error nstep = 425102 point = 2236 imbalance = -0.000001 W/m2
53: filew failed, worst i, j, qtmp, q = 1 30
53: -1.490387661098496E-016 7.093350521850853E-047
31: BalanceCheck: soil balance error nstep = 425131 point = 1865 imbalance = -0.000001 W/m2
31: BalanceCheck: soil balance error nstep = 425132 point = 1865 imbalance = -0.000001 W/m2
60: BalanceCheck: soil balance error nstep = 425145 point = 3617 imbalance = -0.000003 W/m2
60: BalanceCheck: soil balance error nstep = 425146 point = 3617 imbalance = -0.000003 W/m2
23: BalanceCheck: soil balance error nstep = 425169 point = 1357 imbalance = -0.000004 W/m2
23: BalanceCheck: soil balance error nstep = 425170 point = 1357 imbalance = -0.000004 W/m2
6: BalanceCheck: soil balance error nstep = 425189 point = 359 imbalance = -0.000001 W/m2
6: BalanceCheck: soil balance error nstep = 425190 point = 359 imbalance = -0.000001 W/m2
51: BalanceCheck: soil balance error nstep = 425283 point = 3049 imbalance = -0.000001 W/m2
51: BalanceCheck: soil balance error nstep = 425284 point = 3049 imbalance = -0.000001 W/m2
Code:
(seq_diag_print_mct) NET AREA BUDGET (m2/m2): period = monthly: date = 250401 0
atm lnd ocn ice nh ice sh *SUM*
area -1.00000000 0.29324025 0.64718132 0.04019648 0.01938454 0.00000258
(seq_diag_print_mct) NET HEAT BUDGET (W/m2): period = monthly: date = 250401 0
atm lnd rof ocn ice nh ice sh *SUM*
hfreeze 0.00000000 0.00000000 0.00000000 0.07910454 -0.02622360 -0.05286458 0.00001636
hmelt 0.00000000 0.00000000 0.00000000 -0.74348212 0.42499467 0.31831747 -0.00016998
hnetsw -163.05330953 38.32536366 0.00000000 123.34557208 0.88798738 0.48733692 -0.00704949
hlwdn -322.74136669 81.85235878 0.00000000 230.18081722 6.71059416 3.99699017 -0.00060636
hlwup 382.22905060 -100.23357861 0.00000000 -268.95603266 -8.26359495 -4.77674509 -0.00090071
hlatvap 77.82491763 -11.88317259 0.00000000 -65.87975725 -0.04217840 -0.02029389 -0.00048449
hlatfus 0.79383088 -0.34822528 0.00000000 -0.26460311 -0.10267895 -0.07831654 0.00000700
hiroff 0.00000000 0.03837988 0.00003043 0.00000000 0.00000000 0.00000000 0.03841031
hsen 18.41571865 -7.80060996 0.00000000 -10.74228657 0.15480912 -0.02895466 -0.00132343
*SUM* -6.53115846 -0.04948413 0.00003043 7.01933214 -0.25629057 -0.15453020 0.02789921
(seq_diag_print_mct) NET WATER BUDGET (kg/m2s*1e6): period = monthly: date = 250401 0
atm lnd rof ocn ice nh ice sh *SUM*
wfreeze 0.00000000 0.00000000 0.00000000 -0.20972689 0.06952565 0.14015786 -0.00004338
wmelt 0.00000000 0.00000000 0.00000000 -0.28424494 0.50163837 -0.21658632 0.00080711
wrain -28.88472229 6.75483040 0.00000000 22.03273135 0.06024532 0.03685768 -0.00005755
wsnow -2.37887588 1.04352795 0.00000000 0.79293710 0.30769837 0.23469147 -0.00002099
wevap 31.10445695 -4.74133316 0.00000000 -26.34136635 -0.01483905 -0.00711210 -0.00019372
wrunoff 0.00000000 -2.80489561 0.12314988 0.00000000 0.00000000 0.00000000 -2.68174572
wfrzrof 0.00000000 -0.11501312 -0.00009119 0.00000000 0.00000000 0.00000000 -0.11510431
*SUM* -0.15914122 0.13711647 0.12305869 -4.00966972 0.92426866 0.18800857 -2.79635855
tStamp_write: model date = 250401 0 wall clock = 2019-04-18 01:38:04 avg dt = 2.38 dt = 2.84
memory_write: model date = 250401 0 memory = 146.74 MB (highwater) -0.00 MB (usage) (pe= 0 comps= cpl ATM LND OCN ICE GLC ROF WAV)
tStamp_write: model date = 250402 0 wall clock = 2019-04-18 01:38:06 avg dt = 2.38 dt = 2.38
memory_write: model date = 250402 0 memory = 146.74 MB (highwater) -0.00 MB (usage) (pe= 0 comps= cpl ATM LND OCN ICE GLC ROF WAV)
tStamp_write: model date = 250403 0 wall clock = 2019-04-18 01:38:09 avg dt = 2.38 dt = 2.37
memory_write: model date = 250403 0 memory = 146.74 MB (highwater) -0.00 MB (usage) (pe= 0 comps= cpl ATM LND OCN ICE GLC ROF WAV)
tStamp_write: model date = 250404 0 wall clock = 2019-04-18 01:38:11 avg dt = 2.38 dt = 2.39
memory_write: model date = 250404 0 memory = 146.74 MB (highwater) -0.00 MB (usage) (pe= 0 comps= cpl ATM LND OCN ICE GLC ROF WAV)
tStamp_write: model date = 250405 0 wall clock = 2019-04-18 01:38:13 avg dt = 2.38 dt = 2.36
memory_write: model date = 250405 0 memory = 146.74 MB (highwater) -0.00 MB (usage) (pe= 0 comps= cpl ATM LND OCN ICE GLC ROF WAV)
tStamp_write: model date = 250406 0 wall clock = 2019-04-18 01:38:16 avg dt = 2.38 dt = 2.43
memory_write: model date = 250406 0 memory = 146.74 MB (highwater) -0.00 MB (usage) (pe= 0 comps= cpl ATM LND OCN ICE GLC ROF WAV)
tStamp_write: model date = 250407 0 wall clock = 2019-04-18 01:38:18 avg dt = 2.38 dt = 2.40
memory_write: model date = 250407 0 memory = 146.74 MB (highwater) -0.00 MB (usage) (pe= 0 comps= cpl ATM LND OCN ICE GLC ROF WAV)
tStamp_write: model date = 250408 0 wall clock = 2019-04-18 01:38:21 avg dt = 2.38 dt = 2.39
memory_write: model date = 250408 0 memory = 146.74 MB (highwater) -0.00 MB (usage) (pe= 0 comps= cpl ATM LND OCN ICE GLC ROF WAV)
tStamp_write: model date = 250409 0 wall clock = 2019-04-18 01:38:23 avg dt = 2.38 dt = 2.37
memory_write: model date = 250409 0 memory = 146.74 MB (highwater) -0.00 MB (usage) (pe= 0 comps= cpl ATM LND OCN ICE GLC ROF WAV)
tStamp_write: model date = 250410 0 wall clock = 2019-04-18 01:38:25 avg dt = 2.38 dt = 2.38
memory_write: model date = 250410 0 memory = 146.74 MB (highwater) -0.00 MB (usage) (pe= 0 comps= cpl ATM LND OCN ICE GLC ROF WAV)
tStamp_write: model date = 250411 0 wall clock = 2019-04-18 01:38:28 avg dt = 2.38 dt = 2.36
memory_write: model date = 250411 0 memory = 146.74 MB (highwater) -0.00 MB (usage) (pe= 0 comps= cpl ATM LND OCN ICE GLC ROF WAV)
tStamp_write: model date = 250412 0 wall clock = 2019-04-18 01:38:30 avg dt = 2.38 dt = 2.36
memory_write: model date = 250412 0 memory = 146.74 MB (highwater) -0.00 MB (usage) (pe= 0 comps= cpl ATM LND OCN ICE GLC ROF WAV)
Code:
nstep, te 425343 0.32921039180626612E+10 0.32921041332761240E+10 0.11929028825786629E-04 0.98505215837519107E+05
nstep, te 425344 0.32921264972353024E+10 0.32921250316163564E+10 -0.81237513104121090E-04 0.98505239159271499E+05
nstep, te 425345 0.32921471968084836E+10 0.32921456711939802E+10 -0.84562986199844059E-04 0.98505249864586105E+05
nstep, te 425346 0.32921716055284190E+10 0.32921694904318500E+10 -0.11723724872707561E-03 0.98505269089986512E+05
nstep, te 425347 0.32921957574583359E+10 0.32921931562663460E+10 -0.14418092821176325E-03 0.98505270934110289E+05
nstep, te 425348 0.32922196046240435E+10 0.32922173836605086E+10 -0.12310530945831881E-03 0.98505282935014067E+05
nstep, te 425349 0.32922440268539519E+10 0.32922415843710651E+10 -0.13538384641695618E-03 0.98505291279411394E+05
nstep, te 425350 0.32922652208534374E+10 0.32922635071687074E+10 -0.94987447426059490E-04 0.98505297846811067E+05
nstep, te 425351 0.32922871265105324E+10 0.32922853150614657E+10 -0.10040639742915476E-03 0.98505305220259645E+05
120000 250412
Total Mass= 985.053077818377 (mb), Dry Mass= 982.880000093488 (mb
)
Total Precipitable Water = 22.1603331466039 (kg/m**2)
PS max = 1037.62475230829 min = 567.014606950163
U max = 66.1332969844372 min = -47.1004067702001
V max = 33.7656336834553 min = -34.9560846633157
T max = 310.242788710309 min = 183.428902372904
W (mb/day) max = 380.038874494323 min = -407.924193252446
Average Height (geopotential units) = 582.988339352997
PRECC max = 41.2052734358385 min = 0.000000000000000E+000
PRECL max = 22.8397352784598 min = 0.000000000000000E+000
Total precp= 2.73879696830441 CON= 2.06939326576035 LS=
0.669403702544058
nstep, te 425352 0.32923056843956847E+10 0.32923045000447469E+10 -0.65647116472378241E-04 0.98505307781837677E+05
nstep, te 425353 0.32923244978453741E+10 0.32923233300704021E+10 -0.64728328066199621E-04 0.98505312330263710E+05
nstep, te 425354 0.32923392900191512E+10 0.32923387833463516E+10 -0.28084249955686805E-04 0.98505309301401896E+05
nstep, te 425355 0.32923543066691456E+10 0.32923538723877988E+10 -0.24071680734112838E-04 0.98505309512877124E+05
nstep, te 425356 0.32923643985989852E+10 0.32923649802449522E+10 0.32239920973220952E-04 0.98505307527387660E+05
nstep, te 425357 0.32923754230899429E+10 0.32923760318558717E+10 0.33743144592790234E-04 0.98505314875918775E+05
nstep, te 425358 0.32923790140984435E+10 0.32923805358942785E+10 0.84351278844188780E-04 0.98505300128855233E+05
nstep, te 425359 0.32923836379697976E+10 0.32923848794635139E+10 0.68814485489876364E-04 0.98505286270728568E+05
Code:
(docn_comp_run) ocn: model date 250412 0s
(docn_comp_run) ocn: model date 250412 1800s
(docn_comp_run) ocn: model date 250412 3600s
(docn_comp_run) ocn: model date 250412 5400s
(docn_comp_run) ocn: model date 250412 7200s
(docn_comp_run) ocn: model date 250412 9000s
(docn_comp_run) ocn: model date 250412 10800s
(docn_comp_run) ocn: model date 250412 12600s
(docn_comp_run) ocn: model date 250412 14400s
(docn_comp_run) ocn: model date 250412 16200s
(docn_comp_run) ocn: model date 250412 18000s
(docn_comp_run) ocn: model date 250412 19800s
(docn_comp_run) ocn: model date 250412 21600s
(docn_comp_run) ocn: model date 250412 23400s
(docn_comp_run) ocn: model date 250412 25200s
(docn_comp_run) ocn: model date 250412 27000s
(docn_comp_run) ocn: model date 250412 28800s
(docn_comp_run) ocn: model date 250412 30600s
(docn_comp_run) ocn: model date 250412 32400s
(docn_comp_run) ocn: model date 250412 34200s
(docn_comp_run) ocn: model date 250412 36000s
(docn_comp_run) ocn: model date 250412 37800s
(docn_comp_run) ocn: model date 250412 39600s
(docn_comp_run) ocn: model date 250412 41400s
(docn_comp_run) ocn: model date 250412 43200s
(docn_comp_run) ocn: model date 250412 45000s
(docn_comp_run) ocn: model date 250412 46800s
(docn_comp_run) ocn: model date 250412 48600s
(docn_comp_run) ocn: model date 250412 50400s
(docn_comp_run) ocn: model date 250412 52200s
(docn_comp_run) ocn: model date 250412 54000s
Code:
clm2: completed timestep 425329
clm2: completed timestep 425330
clm2: completed timestep 425331
clm2: completed timestep 425332
clm2: completed timestep 425333
clm2: completed timestep 425334
clm2: completed timestep 425335
clm2: completed timestep 425336
clm2: completed timestep 425337
clm2: completed timestep 425338
clm2: completed timestep 425339
clm2: completed timestep 425340
clm2: completed timestep 425341
clm2: completed timestep 425342
clm2: completed timestep 425343
clm2: completed timestep 425344
clm2: completed timestep 425345
clm2: completed timestep 425346
clm2: completed timestep 425347
clm2: completed timestep 425348
clm2: completed timestep 425349
clm2: completed timestep 425350
clm2: completed timestep 425351
clm2: completed timestep 425352
clm2: completed timestep 425353
clm2: completed timestep 425354
clm2: completed timestep 425355
clm2: completed timestep 425356
clm2: completed timestep 425357
clm2: completed timestep 425358
Code:
aero: 3 faero-fsoot : 964238.712753577
6965.40539540224
aero: 3 aerotot : 11632605283.1229
154355202.439234
aero: 3 aerotot change: 964238.703186035
6965.40536493063
aero: 3 aeromax agg: 2.635567523154776E-003
7.296360311253620E-005
Arctic Antarctic
total ice area (km^2) = 2.00945086160375476E+07 1.35309127651979420E+07
total ice extent(km^2) = 2.20291190032451749E+07 1.47384636229330283E+07
total ice volume (m^3) = 6.88024434738068750E+13 2.34776381742008086E+13
total snw volume (m^3) = 6.66084391472641406E+12 5.72946900102083301E+12
tot kinetic energy (J) = 1.74521771242845000E+14 3.04268053722771375E+14
rms ice speed (m/s) = 0.07311672633334403 0.16119528570787445
average albedo = 0.72837027208873206 0.72235144764844084
max ice volume (m) = 11.36771336308637892 8.15412867414148046
max ice speed (m/s) = 0.37128519369046487 0.33065583035276547
max strength (kN/m) = 918.44658971979947637 176.50532225058483959
----------------------------
arwt rain h2o kg in dt = 2.52054563698879585E+10 3.20214785572211838E+10
arwt snow h2o kg in dt = 2.80552510902386841E+11 3.04370879343336914E+11
arwt evap h2o kg in dt = -4.19320814133738632E+10 3.49570613782832861E+09
arwt frzl h2o kg in dt = 1.43208999101032887E+10 2.21800242156592621E+11
arwt frsh h2o kg in dt = -9.30511758540998383E+10 -1.14381027288429443E+12
arwt ice mass (kg) = 6.30918406654809040E+16 2.15289942057421400E+16
arwt snw mass (kg) = 2.19807849185971675E+15 1.89072477033687500E+15
arwt tot mass (kg) = 6.52899191573406240E+16 2.34197189760790160E+16
arwt tot mass chng(kg) = 3.71197961616000000E+11 1.70549857907200000E+12
arwt water flux = 3.71197961623104004E+11 1.70549857907927344E+12
(=rain+snow+evap+frzl-fresh)
water flux error = 1.08807056249070093E-16 3.10568948646613347E-16
----------------------------
arwt atm heat flux (W) = -1.61052913247670625E+14 -4.09693210590881625E+14
arwt ocn heat flux (W) = -1.86565098595157125E+14 -1.04561653219656187E+14
arwt frzl heat flux(W) = 2.65493572222303711E+12 4.11193004486971953E+13
arwt tot energy (J) = -2.27898341317840061E+22 -7.70622817393212444E+21
arwt net heat (J) = 4.11430493254742320E+16 -6.23251544075860736E+17
arwt tot energy chng(J)= 4.11458421611560960E+16 -6.23250774161358848E+17
arwt heat error = 1.22547433461525362E-10 9.99080853188842203E-11
----------------------------
arwt salt mass (kg) = 2.52367362661923625E+14 8.61159768229685625E+13
arwt salt mass chng(kg)= 9.59008541255586863E+08 5.35289761172562981E+09
arwt salt flx in dt(kg)= -9.59008541289906263E+08 -5.35289761175689507E+09
arwt salt flx error = -1.35989853938951434E-16 -3.63059909932104165E-16
----------------------------
Code:
(Rtmrun) model date is 250326 0
(Rtmrun) model date is 250327 0
(Rtmrun) model date is 250328 0
(Rtmrun) model date is 250329 0
(Rtmrun) model date is 250330 0
(Rtmrun) model date is 250331 0
(Rtmrun) model date is 250401 0
hist_htapes_wrapup : Creating history file ./test_64cores.rtm.h0.0025-03.nc
at nstep = 70800
calling htape_create for file t = 1
htape_create : Opening netcdf htape ./test_64cores.rtm.h0.0025-03.nc
htape_create : Successfully defined netcdf history file 1
hist_htapes_wrapup : Writing current time sample to local history file
./test_64cores.rtm.h0.0025-03.nc at nstep = 70800
for history time interval beginning at 8819.00000000000 and ending at
8850.00000000000
hist_htapes_wrapup : Closing local history file
./test_64cores.rtm.h0.0025-03.nc at nstep = 70800
(Rtmrun) model date is 250402 0
(Rtmrun) model date is 250403 0
(Rtmrun) model date is 250404 0
(Rtmrun) model date is 250405 0
(Rtmrun) model date is 250406 0
(Rtmrun) model date is 250407 0
(Rtmrun) model date is 250408 0
(Rtmrun) model date is 250409 0
(Rtmrun) model date is 250410 0
(Rtmrun) model date is 250411 0
(Rtmrun) model date is 250412 0