ep2764@columbia_edu
Member
Hi Everyone,I am trying to create a faster PES layout for the BWCN compset on Yellowstone. The default layout and timing is:env_mach_pes.xml:
timing: component comp_pes root_pe tasks x threads instances (stride)
--------- ------ ------- ------ ------ --------- ------
cpl = cpl 360 0 180 x 2 1 (1 )
glc = sglc 360 0 180 x 2 1 (1 )
wav = swav 360 0 180 x 2 1 (1 )
lnd = clm 120 0 60 x 2 1 (1 )
rof = rtm 120 0 60 x 2 1 (1 )
ice = cice 240 60 120 x 2 1 (1 )
atm = cam 360 0 180 x 2 1 (1 )
ocn = pop2 60 180 30 x 2 1 (1 )
total pes active : 420
pes per node : 16
pe count for cost estimate : 224
Overall Metrics:
Model Cost: 1263.13 pe-hrs/simulated_year
Model Throughput: 4.26 simulated_years/day
Init Time : 72.006 seconds
Run Time : 40600.688 seconds 55.617 seconds/day
Final Time : 0.059 seconds
Actual Ocn Init Wait Time : 0.000 seconds
Estimated Ocn Init Run Time : 0.000 seconds
Estimated Run Time Correction : 0.000 seconds
(This correction has been applied to the ocean and total run times)
Runs Time in total seconds, seconds/model-day, and model-years/wall-day
CPL Run Time represents time in CPL pes alone, not including time associated with data exchange with other components
TOT Run Time: 40600.688 seconds 55.617 seconds/mday 4.26 myears/wday
LND Run Time: 584.937 seconds 0.801 seconds/mday 295.42 myears/wday
ROF Run Time: 21.871 seconds 0.030 seconds/mday 7900.87 myears/wday
ICE Run Time: 2406.515 seconds 3.297 seconds/mday 71.81 myears/wday
ATM Run Time: 37422.500 seconds 51.264 seconds/mday 4.62 myears/wday
OCN Run Time: 11552.612 seconds 15.825 seconds/mday 14.96 myears/wday
GLC Run Time: 0.000 seconds 0.000 seconds/mday 0.00 myears/wday
WAV Run Time: 0.000 seconds 0.000 seconds/mday 0.00 myears/wday
CPL Run Time: 2230.241 seconds 3.055 seconds/mday 77.48 myears/wday
CPL COMM Time: 29009.397 seconds 39.739 seconds/mday 5.96 myears/wday
As you can see, the ATM run time is 51.264 seconds. I also noticed that the PES_LEVEL is '2rp'. I tried ramping up the number of cores to get ATM to run faster, but it actually ran slower (see timing below):timing: component comp_pes root_pe tasks x threads instances (stride)
--------- ------ ------- ------ ------ --------- ------
cpl = cpl 480 0 240 x 2 1 (1 )
glc = sglc 360 0 180 x 2 1 (1 )
wav = swav 360 0 180 x 2 1 (1 )
lnd = clm 60 0 30 x 2 1 (1 )
rof = rtm 120 0 60 x 2 1 (1 )
ice = cice 480 60 240 x 2 1 (1 )
atm = cam 1280 0 640 x 2 1 (1 )
ocn = pop2 60 180 30 x 2 1 (1 )
total pes active : 1280
pes per node : 16
pe count for cost estimate : 688
Overall Metrics:
Model Cost: 5022.65 pe-hrs/simulated_year
Model Throughput: 3.29 simulated_years/day
Init Time : 71.674 seconds
Run Time : 360.018 seconds 72.004 seconds/day
Final Time : 0.080 seconds
Actual Ocn Init Wait Time : 0.000 seconds
Estimated Ocn Init Run Time : 16.866 seconds
Estimated Run Time Correction : 16.866 seconds
(This correction has been applied to the ocean and total run times)
Runs Time in total seconds, seconds/model-day, and model-years/wall-day
CPL Run Time represents time in CPL pes alone, not including time associated with data exchange with other components
TOT Run Time: 360.018 seconds 72.004 seconds/mday 3.29 myears/wday
LND Run Time: 10.600 seconds 2.120 seconds/mday 111.66 myears/wday
ROF Run Time: 0.539 seconds 0.108 seconds/mday 2195.85 myears/wday
ICE Run Time: 15.844 seconds 3.169 seconds/mday 74.70 myears/wday
ATM Run Time: 319.446 seconds 63.889 seconds/mday 3.71 myears/wday
OCN Run Time: 84.330 seconds 16.866 seconds/mday 14.03 myears/wday
GLC Run Time: 0.000 seconds 0.000 seconds/mday 0.00 myears/wday
WAV Run Time: 0.000 seconds 0.000 seconds/mday 0.00 myears/wday
CPL Run Time: 20.090 seconds 4.018 seconds/mday 58.91 myears/wday
CPL COMM Time: 24.674 seconds 4.935 seconds/mday 47.97 myears/wday
So then I compared to an optimized B compset for 1850 with WACCM that someone else is using and it uses a PES_LEVEL of '3rcm'. I have no idea what the PES_LEVEL is since the only description I can find is, 'pes level determined by automated initialization (DO NOT EDIT)'.Like a scientist, I edited it in my simulation (along with halving my PES_PER_NODE to 16), and now it runs a little faster, see the following timing (note I used many less pes for atm): component comp_pes root_pe tasks x threads instances (stride)
--------- ------ ------- ------ ------ --------- ------
cpl = cpl 240 0 240 x 1 1 (1 )
glc = sglc 1 0 1 x 1 1 (1 )
wav = swav 1 0 1 x 1 1 (1 )
lnd = clm 48 240 48 x 1 1 (1 )
rof = rtm 30 0 30 x 1 1 (1 )
ice = cice 240 0 240 x 1 1 (1 )
atm = cam 320 0 320 x 1 1 (1 )
ocn = pop2 32 320 32 x 1 1 (1 )
total pes active : 352
pes per node : 16
pe count for cost estimate : 704
Overall Metrics:
Model Cost: 3425.68 pe-hrs/simulated_year
Model Throughput: 4.93 simulated_years/day
Init Time : 54.028 seconds
Run Time : 239.968 seconds 47.994 seconds/day
Final Time : 0.146 seconds
Actual Ocn Init Wait Time : 54.340 seconds
Estimated Ocn Init Run Time : 12.721 seconds
Estimated Run Time Correction : 0.000 seconds
(This correction has been applied to the ocean and total run times)
Runs Time in total seconds, seconds/model-day, and model-years/wall-day
CPL Run Time represents time in CPL pes alone, not including time associated with data exchange with other components
TOT Run Time: 239.968 seconds 47.994 seconds/mday 4.93 myears/wday
LND Run Time: 7.713 seconds 1.543 seconds/mday 153.45 myears/wday
ROF Run Time: 0.441 seconds 0.088 seconds/mday 2683.81 myears/wday
ICE Run Time: 9.087 seconds 1.817 seconds/mday 130.25 myears/wday
ATM Run Time: 225.255 seconds 45.051 seconds/mday 5.25 myears/wday
OCN Run Time: 63.605 seconds 12.721 seconds/mday 18.61 myears/wday
GLC Run Time: 0.000 seconds 0.000 seconds/mday 0.00 myears/wday
WAV Run Time: 0.000 seconds 0.000 seconds/mday 0.00 myears/wday
CPL Run Time: 4.826 seconds 0.965 seconds/mday 245.25 myears/wday
CPL COMM Time: 133.160 seconds 26.632 seconds/mday 8.89 myears/wday
So my question is, what is PES_LEVEL? What else can I do to make this run faster? I know that ATM runs sequentially with LND and ICE, which all in turn run parallel to OCN. So I am really just trying to reduce ATM as much as possible. Thanks for any advice!-Dr. Ethan D. Peck
timing: component comp_pes root_pe tasks x threads instances (stride)
--------- ------ ------- ------ ------ --------- ------
cpl = cpl 360 0 180 x 2 1 (1 )
glc = sglc 360 0 180 x 2 1 (1 )
wav = swav 360 0 180 x 2 1 (1 )
lnd = clm 120 0 60 x 2 1 (1 )
rof = rtm 120 0 60 x 2 1 (1 )
ice = cice 240 60 120 x 2 1 (1 )
atm = cam 360 0 180 x 2 1 (1 )
ocn = pop2 60 180 30 x 2 1 (1 )
total pes active : 420
pes per node : 16
pe count for cost estimate : 224
Overall Metrics:
Model Cost: 1263.13 pe-hrs/simulated_year
Model Throughput: 4.26 simulated_years/day
Init Time : 72.006 seconds
Run Time : 40600.688 seconds 55.617 seconds/day
Final Time : 0.059 seconds
Actual Ocn Init Wait Time : 0.000 seconds
Estimated Ocn Init Run Time : 0.000 seconds
Estimated Run Time Correction : 0.000 seconds
(This correction has been applied to the ocean and total run times)
Runs Time in total seconds, seconds/model-day, and model-years/wall-day
CPL Run Time represents time in CPL pes alone, not including time associated with data exchange with other components
TOT Run Time: 40600.688 seconds 55.617 seconds/mday 4.26 myears/wday
LND Run Time: 584.937 seconds 0.801 seconds/mday 295.42 myears/wday
ROF Run Time: 21.871 seconds 0.030 seconds/mday 7900.87 myears/wday
ICE Run Time: 2406.515 seconds 3.297 seconds/mday 71.81 myears/wday
ATM Run Time: 37422.500 seconds 51.264 seconds/mday 4.62 myears/wday
OCN Run Time: 11552.612 seconds 15.825 seconds/mday 14.96 myears/wday
GLC Run Time: 0.000 seconds 0.000 seconds/mday 0.00 myears/wday
WAV Run Time: 0.000 seconds 0.000 seconds/mday 0.00 myears/wday
CPL Run Time: 2230.241 seconds 3.055 seconds/mday 77.48 myears/wday
CPL COMM Time: 29009.397 seconds 39.739 seconds/mday 5.96 myears/wday
As you can see, the ATM run time is 51.264 seconds. I also noticed that the PES_LEVEL is '2rp'. I tried ramping up the number of cores to get ATM to run faster, but it actually ran slower (see timing below):timing: component comp_pes root_pe tasks x threads instances (stride)
--------- ------ ------- ------ ------ --------- ------
cpl = cpl 480 0 240 x 2 1 (1 )
glc = sglc 360 0 180 x 2 1 (1 )
wav = swav 360 0 180 x 2 1 (1 )
lnd = clm 60 0 30 x 2 1 (1 )
rof = rtm 120 0 60 x 2 1 (1 )
ice = cice 480 60 240 x 2 1 (1 )
atm = cam 1280 0 640 x 2 1 (1 )
ocn = pop2 60 180 30 x 2 1 (1 )
total pes active : 1280
pes per node : 16
pe count for cost estimate : 688
Overall Metrics:
Model Cost: 5022.65 pe-hrs/simulated_year
Model Throughput: 3.29 simulated_years/day
Init Time : 71.674 seconds
Run Time : 360.018 seconds 72.004 seconds/day
Final Time : 0.080 seconds
Actual Ocn Init Wait Time : 0.000 seconds
Estimated Ocn Init Run Time : 16.866 seconds
Estimated Run Time Correction : 16.866 seconds
(This correction has been applied to the ocean and total run times)
Runs Time in total seconds, seconds/model-day, and model-years/wall-day
CPL Run Time represents time in CPL pes alone, not including time associated with data exchange with other components
TOT Run Time: 360.018 seconds 72.004 seconds/mday 3.29 myears/wday
LND Run Time: 10.600 seconds 2.120 seconds/mday 111.66 myears/wday
ROF Run Time: 0.539 seconds 0.108 seconds/mday 2195.85 myears/wday
ICE Run Time: 15.844 seconds 3.169 seconds/mday 74.70 myears/wday
ATM Run Time: 319.446 seconds 63.889 seconds/mday 3.71 myears/wday
OCN Run Time: 84.330 seconds 16.866 seconds/mday 14.03 myears/wday
GLC Run Time: 0.000 seconds 0.000 seconds/mday 0.00 myears/wday
WAV Run Time: 0.000 seconds 0.000 seconds/mday 0.00 myears/wday
CPL Run Time: 20.090 seconds 4.018 seconds/mday 58.91 myears/wday
CPL COMM Time: 24.674 seconds 4.935 seconds/mday 47.97 myears/wday
So then I compared to an optimized B compset for 1850 with WACCM that someone else is using and it uses a PES_LEVEL of '3rcm'. I have no idea what the PES_LEVEL is since the only description I can find is, 'pes level determined by automated initialization (DO NOT EDIT)'.Like a scientist, I edited it in my simulation (along with halving my PES_PER_NODE to 16), and now it runs a little faster, see the following timing (note I used many less pes for atm): component comp_pes root_pe tasks x threads instances (stride)
--------- ------ ------- ------ ------ --------- ------
cpl = cpl 240 0 240 x 1 1 (1 )
glc = sglc 1 0 1 x 1 1 (1 )
wav = swav 1 0 1 x 1 1 (1 )
lnd = clm 48 240 48 x 1 1 (1 )
rof = rtm 30 0 30 x 1 1 (1 )
ice = cice 240 0 240 x 1 1 (1 )
atm = cam 320 0 320 x 1 1 (1 )
ocn = pop2 32 320 32 x 1 1 (1 )
total pes active : 352
pes per node : 16
pe count for cost estimate : 704
Overall Metrics:
Model Cost: 3425.68 pe-hrs/simulated_year
Model Throughput: 4.93 simulated_years/day
Init Time : 54.028 seconds
Run Time : 239.968 seconds 47.994 seconds/day
Final Time : 0.146 seconds
Actual Ocn Init Wait Time : 54.340 seconds
Estimated Ocn Init Run Time : 12.721 seconds
Estimated Run Time Correction : 0.000 seconds
(This correction has been applied to the ocean and total run times)
Runs Time in total seconds, seconds/model-day, and model-years/wall-day
CPL Run Time represents time in CPL pes alone, not including time associated with data exchange with other components
TOT Run Time: 239.968 seconds 47.994 seconds/mday 4.93 myears/wday
LND Run Time: 7.713 seconds 1.543 seconds/mday 153.45 myears/wday
ROF Run Time: 0.441 seconds 0.088 seconds/mday 2683.81 myears/wday
ICE Run Time: 9.087 seconds 1.817 seconds/mday 130.25 myears/wday
ATM Run Time: 225.255 seconds 45.051 seconds/mday 5.25 myears/wday
OCN Run Time: 63.605 seconds 12.721 seconds/mday 18.61 myears/wday
GLC Run Time: 0.000 seconds 0.000 seconds/mday 0.00 myears/wday
WAV Run Time: 0.000 seconds 0.000 seconds/mday 0.00 myears/wday
CPL Run Time: 4.826 seconds 0.965 seconds/mday 245.25 myears/wday
CPL COMM Time: 133.160 seconds 26.632 seconds/mday 8.89 myears/wday
So my question is, what is PES_LEVEL? What else can I do to make this run faster? I know that ATM runs sequentially with LND and ICE, which all in turn run parallel to OCN. So I am really just trying to reduce ATM as much as possible. Thanks for any advice!-Dr. Ethan D. Peck