Hi
I have verified my port to the machine Rockfish (machine at JHU). (using the instructions for the 3 ensemble member, super fast : resulting in the 3 output files case.cesm_tag.uf.000, case.cesm_tag.uf.001 and case.cesm_tag.uf.002)
After verifying the port, I created a newcase with : create_newcase --case testb1850_01 --compset B1850 --res f09_g17_gl4 --machine rockfish --pesfile ~/.cime/config_pes.xml
then proceeded to case.setup
case.build
check_input_data
case.submit
As of right now the machine is set up as follows :
config_machines.xml
<init_path lang="perl">/data/apps/linux-centos8-cascadelake/gcc-9.3.0/lmod-8.3-tbez7qmxvu3dwikqbs3hdafke5vcsbxv/lmod/lmod/init/perl</init_path>
<init_path lang="python">/data/apps/linux-centos8-cascadelake/gcc-9.3.0/lmod-8.3-tbez7qmxvu3dwikqbs3hdafke5vcsbxv/lmod/lmod/init/env_modules_python.py</init_path>
<cmd_path lang="sh">module</cmd_path>
<cmd_path lang="csh">module</cmd_path>
<cmd_path lang="perl">/data/apps/linux-centos8-cascadelake/gcc-9.3.0/lmod-8.3-tbez7qmxvu3dwikqbs3hdafke5vcsbxv/lmod/lmod/libexec/lmod perl</cmd_path>
<cmd_path lang="python">module</cmd_path>
<modules>
<command name="purge"/>
<command name="load">standard/2020.10</command>
<command name="unload">openmpi/3.1.6</command>
</modules>
<modules compiler="gnu">
<command name="load">cesm/2.x</command>
</modules>
<modules compiler="intel">
<command name="load">intel/2022.0</command>
<command name="load">intel-mkl/2022.0</command>
<command name="load">cesm/2.x</command>
</modules>
</module_system>
<environment_variables>
<env name="OMP_STACKSIZE">256M</env>
<env name="NETCDF_PATH">$ENV{NETCDF}</env>
<env name="OMP_NUM_THREADS">1</env>
</environment_variables>
</machine>
</config_machines>
config_pes.xml:
<ntasks_lnd>-1</ntasks_lnd>
<ntasks_rof>-1</ntasks_rof>
<ntasks_ice>-2</ntasks_ice>
<ntasks_ocn>-1</ntasks_ocn>
<ntasks_glc>-1</ntasks_glc>
<ntasks_wav>-1</ntasks_wav>
<ntasks_cpl>-3</ntasks_cpl>
</ntasks>
<nthrds>
<nthrds_atm>1</nthrds_atm>
<nthrds_lnd>1</nthrds_lnd>
<nthrds_rof>1</nthrds_rof>
<nthrds_ice>1</nthrds_ice>
<nthrds_ocn>1</nthrds_ocn>
<nthrds_glc>1</nthrds_glc>
<nthrds_wav>1</nthrds_wav>
<nthrds_cpl>1</nthrds_cpl>
</nthrds>
<rootpe>
<rootpe_atm>0</rootpe_atm>
<rootpe_lnd>0</rootpe_lnd>
<rootpe_rof>0</rootpe_rof>
<rootpe_ice>-1</rootpe_ice>
<rootpe_ocn>-3</rootpe_ocn>
<rootpe_glc>0</rootpe_glc>
<rootpe_wav>0</rootpe_wav>
<rootpe_cpl>0</rootpe_cpl>
</rootpe>
</pes>
</mach>
</grid>
</config_pes>
I have no experience with optimizing a run for throughput. I could you any help!!!
On our current run (testb1850_01)
the timing file shows this:
total pes active : 192
mpi tasks per node : 48
pe count for cost estimate : 192
Overall Metrics:
Model Cost: 15651.97 pe-hrs/simulated_year
Model Throughput: 0.29 simulated_years/day
Init Time : 210.770 seconds
Run Time : 4020.197 seconds 804.039 seconds/day
Final Time : 0.191 seconds
Actual Ocn Init Wait Time : 3334.348 seconds
Estimated Ocn Init Run Time : 4.644 seconds
Estimated Run Time Correction : 0.000 seconds
(This correction has been applied to the ocean and total run times)
Runs Time in total seconds, seconds/model-day, and model-years/wall-day
CPL Run Time represents time in CPL pes alone, not including time associated with data exchange with other components
TOT Run Time: 4020.197 seconds 804.039 seconds/mday 0.29 myears/wday
CPL Run Time: 208.698 seconds 41.740 seconds/mday 5.67 myears/wday
ATM Run Time: 3365.850 seconds 673.170 seconds/mday 0.35 myears/wday
LND Run Time: 186.269 seconds 37.254 seconds/mday 6.35 myears/wday
ICE Run Time: 37.430 seconds 7.486 seconds/mday 31.62 myears/wday
OCN Run Time: 557.270 seconds 111.454 seconds/mday 2.12 myears/wday
ROF Run Time: 18.114 seconds 3.623 seconds/mday 65.34 myears/wday
We can not afford to run this slow. We would need to get a throughput of at least 5 years per day.
I am not sure whether this can be accomplished by specifying a different pe layout, reducing the number of variables we output (we are currently using the default values when we create_newcase with this command: create_newcase --case testb1850_01 --compset B1850 --res f09_g17_gl4 --machine rockfish --pesfile ~/.cime/config_pes.xml )
the stat file in the timing directory shows this
***** GLOBAL STATISTICS ( 192 MPI TASKS) *****
$Id: gptl.c,v 1.157 2011-03-28 20:55:18 rosinski Exp $
'count' is cumulative. All other stats are max/min
'on' indicates whether the timer was active during output, and so stats are lower or upper bounds.
name on processes threads count walltotal wallmax (proc thrd ) wallmin (proc thrd )
"CPL:INIT" - 192 192 1.920000e+02 3.925829e+04 210.770 ( 72 0) 185.592 ( 190 0)
"CPL:cime_pre_init1" - 192 192 1.920000e+02 3.880440e+02 2.035 ( 55 0) 2.008 ( 160 0)
"CPL:ESMF_Initialize" - 192 192 1.920000e+02 4.800000e-02 0.001 ( 96 0) 0.000 ( 0 0)
"CPL:cime_pre_init2" - 192 192 1.920000e+02 6.767000e+00 0.049 ( 144 0) 0.028 ( 48 0)
"CPL:cime_init" - 192 192 1.920000e+02 3.886343e+04 208.707 ( 134 0) 183.534 ( 180 0)
"CPL:init_comps" - 192 192 1.920000e+02 3.237899e+04 168.995 ( 72 0) 167.567 ( 144 0)
"CPL:comp_init_pre_all" - 192 192 1.920000e+02 1.091075e-02 0.000 ( 169 0) 0.000 ( 22 0)
"CPL:comp_init_cc_atm" - 192 192 1.920000e+02 1.584994e+04 110.069 ( 5 0) 0.000 ( 170 0)
"CPL:comp_init_cc_lnd" - 192 192 1.920000e+02 2.919690e+03 20.298 ( 1 0) 0.000 ( 144 0)
"CPL:comp_init_cc_rof" - 192 192 1.920000e+02 1.637530e+02 1.142 ( 40 0) 0.000 ( 146 0)
"CPL:comp_init_cc_ocn" - 192 192 1.920000e+02 1.120859e+04 156.997 ( 158 0) 25.507 ( 121 0)
CPL:comp_init_cc_ice" - 192 192 1.920000e+02 3.657838e+02 2.541 ( 29 0) 0.000 ( 153 0)
"CPL:comp_init_cc_glc" - 192 192 1.920000e+02 7.810636e+02 5.426 ( 33 0) 0.000 ( 144 0)
"CPL:comp_init_cc_wav" - 192 192 1.920000e+02 3.913213e+01 0.273 ( 2 0) 0.000 ( 152 0)
"CPL:comp_init_cc_esp" - 192 192 1.920000e+02 3.140450e-02 0.000 ( 51 0) 0.000 ( 154 0)
"comp_init_cc_iac" - 192 192 1.920000e+02 2.992821e-02 0.000 ( 111 0) 0.000 ( 146 0)
"CPL:comp_init_cx_all" - 192 192 1.920000e+02 1.050946e+03 10.647 ( 184 0) 3.762 ( 22 0)
"CPL:comp_list_all" - 192 192 1.920000e+02 3.396273e-03 0.001 ( 0 0) 0.000 ( 151 0)
"CPL:init_maps" - 144 144 1.440000e+02 1.711666e+03 11.890 ( 18 0) 11.884 ( 126 0)
"CPL:init_aream" - 144 144 1.440000e+02 3.489308e+01 0.242 ( 138 0) 0.242 ( 61 0)
"CPL:init_domain_check" - 144 144 1.440000e+02 2.021436e+00 0.014 ( 0 0) 0.014 ( 97 0)
"CPL:init_areacor" - 192 192 1.920000e+02 1.042950e+03 14.538 ( 177 0) 2.397 ( 28 0)
"CPL:init_fracs" - 144 144 1.440000e+02 2.069594e+00 0.031 ( 69 0) 0.002 ( 3 0)
"CPL:init_aoflux" - 144 144 1.440000e+02 4.045177e-02 0.001 ( 132 0) 0.000 ( 106 0)
I have verified my port to the machine Rockfish (machine at JHU). (using the instructions for the 3 ensemble member, super fast : resulting in the 3 output files case.cesm_tag.uf.000, case.cesm_tag.uf.001 and case.cesm_tag.uf.002)
After verifying the port, I created a newcase with : create_newcase --case testb1850_01 --compset B1850 --res f09_g17_gl4 --machine rockfish --pesfile ~/.cime/config_pes.xml
then proceeded to case.setup
case.build
check_input_data
case.submit
As of right now the machine is set up as follows :
config_machines.xml
<init_path lang="perl">/data/apps/linux-centos8-cascadelake/gcc-9.3.0/lmod-8.3-tbez7qmxvu3dwikqbs3hdafke5vcsbxv/lmod/lmod/init/perl</init_path>
<init_path lang="python">/data/apps/linux-centos8-cascadelake/gcc-9.3.0/lmod-8.3-tbez7qmxvu3dwikqbs3hdafke5vcsbxv/lmod/lmod/init/env_modules_python.py</init_path>
<cmd_path lang="sh">module</cmd_path>
<cmd_path lang="csh">module</cmd_path>
<cmd_path lang="perl">/data/apps/linux-centos8-cascadelake/gcc-9.3.0/lmod-8.3-tbez7qmxvu3dwikqbs3hdafke5vcsbxv/lmod/lmod/libexec/lmod perl</cmd_path>
<cmd_path lang="python">module</cmd_path>
<modules>
<command name="purge"/>
<command name="load">standard/2020.10</command>
<command name="unload">openmpi/3.1.6</command>
</modules>
<modules compiler="gnu">
<command name="load">cesm/2.x</command>
</modules>
<modules compiler="intel">
<command name="load">intel/2022.0</command>
<command name="load">intel-mkl/2022.0</command>
<command name="load">cesm/2.x</command>
</modules>
</module_system>
<environment_variables>
<env name="OMP_STACKSIZE">256M</env>
<env name="NETCDF_PATH">$ENV{NETCDF}</env>
<env name="OMP_NUM_THREADS">1</env>
</environment_variables>
</machine>
</config_machines>
config_pes.xml:
<ntasks_lnd>-1</ntasks_lnd>
<ntasks_rof>-1</ntasks_rof>
<ntasks_ice>-2</ntasks_ice>
<ntasks_ocn>-1</ntasks_ocn>
<ntasks_glc>-1</ntasks_glc>
<ntasks_wav>-1</ntasks_wav>
<ntasks_cpl>-3</ntasks_cpl>
</ntasks>
<nthrds>
<nthrds_atm>1</nthrds_atm>
<nthrds_lnd>1</nthrds_lnd>
<nthrds_rof>1</nthrds_rof>
<nthrds_ice>1</nthrds_ice>
<nthrds_ocn>1</nthrds_ocn>
<nthrds_glc>1</nthrds_glc>
<nthrds_wav>1</nthrds_wav>
<nthrds_cpl>1</nthrds_cpl>
</nthrds>
<rootpe>
<rootpe_atm>0</rootpe_atm>
<rootpe_lnd>0</rootpe_lnd>
<rootpe_rof>0</rootpe_rof>
<rootpe_ice>-1</rootpe_ice>
<rootpe_ocn>-3</rootpe_ocn>
<rootpe_glc>0</rootpe_glc>
<rootpe_wav>0</rootpe_wav>
<rootpe_cpl>0</rootpe_cpl>
</rootpe>
</pes>
</mach>
</grid>
</config_pes>
I have no experience with optimizing a run for throughput. I could you any help!!!
On our current run (testb1850_01)
the timing file shows this:
total pes active : 192
mpi tasks per node : 48
pe count for cost estimate : 192
Overall Metrics:
Model Cost: 15651.97 pe-hrs/simulated_year
Model Throughput: 0.29 simulated_years/day
Init Time : 210.770 seconds
Run Time : 4020.197 seconds 804.039 seconds/day
Final Time : 0.191 seconds
Actual Ocn Init Wait Time : 3334.348 seconds
Estimated Ocn Init Run Time : 4.644 seconds
Estimated Run Time Correction : 0.000 seconds
(This correction has been applied to the ocean and total run times)
Runs Time in total seconds, seconds/model-day, and model-years/wall-day
CPL Run Time represents time in CPL pes alone, not including time associated with data exchange with other components
TOT Run Time: 4020.197 seconds 804.039 seconds/mday 0.29 myears/wday
CPL Run Time: 208.698 seconds 41.740 seconds/mday 5.67 myears/wday
ATM Run Time: 3365.850 seconds 673.170 seconds/mday 0.35 myears/wday
LND Run Time: 186.269 seconds 37.254 seconds/mday 6.35 myears/wday
ICE Run Time: 37.430 seconds 7.486 seconds/mday 31.62 myears/wday
OCN Run Time: 557.270 seconds 111.454 seconds/mday 2.12 myears/wday
ROF Run Time: 18.114 seconds 3.623 seconds/mday 65.34 myears/wday
We can not afford to run this slow. We would need to get a throughput of at least 5 years per day.
I am not sure whether this can be accomplished by specifying a different pe layout, reducing the number of variables we output (we are currently using the default values when we create_newcase with this command: create_newcase --case testb1850_01 --compset B1850 --res f09_g17_gl4 --machine rockfish --pesfile ~/.cime/config_pes.xml )
the stat file in the timing directory shows this
***** GLOBAL STATISTICS ( 192 MPI TASKS) *****
$Id: gptl.c,v 1.157 2011-03-28 20:55:18 rosinski Exp $
'count' is cumulative. All other stats are max/min
'on' indicates whether the timer was active during output, and so stats are lower or upper bounds.
name on processes threads count walltotal wallmax (proc thrd ) wallmin (proc thrd )
"CPL:INIT" - 192 192 1.920000e+02 3.925829e+04 210.770 ( 72 0) 185.592 ( 190 0)
"CPL:cime_pre_init1" - 192 192 1.920000e+02 3.880440e+02 2.035 ( 55 0) 2.008 ( 160 0)
"CPL:ESMF_Initialize" - 192 192 1.920000e+02 4.800000e-02 0.001 ( 96 0) 0.000 ( 0 0)
"CPL:cime_pre_init2" - 192 192 1.920000e+02 6.767000e+00 0.049 ( 144 0) 0.028 ( 48 0)
"CPL:cime_init" - 192 192 1.920000e+02 3.886343e+04 208.707 ( 134 0) 183.534 ( 180 0)
"CPL:init_comps" - 192 192 1.920000e+02 3.237899e+04 168.995 ( 72 0) 167.567 ( 144 0)
"CPL:comp_init_pre_all" - 192 192 1.920000e+02 1.091075e-02 0.000 ( 169 0) 0.000 ( 22 0)
"CPL:comp_init_cc_atm" - 192 192 1.920000e+02 1.584994e+04 110.069 ( 5 0) 0.000 ( 170 0)
"CPL:comp_init_cc_lnd" - 192 192 1.920000e+02 2.919690e+03 20.298 ( 1 0) 0.000 ( 144 0)
"CPL:comp_init_cc_rof" - 192 192 1.920000e+02 1.637530e+02 1.142 ( 40 0) 0.000 ( 146 0)
"CPL:comp_init_cc_ocn" - 192 192 1.920000e+02 1.120859e+04 156.997 ( 158 0) 25.507 ( 121 0)
CPL:comp_init_cc_ice" - 192 192 1.920000e+02 3.657838e+02 2.541 ( 29 0) 0.000 ( 153 0)
"CPL:comp_init_cc_glc" - 192 192 1.920000e+02 7.810636e+02 5.426 ( 33 0) 0.000 ( 144 0)
"CPL:comp_init_cc_wav" - 192 192 1.920000e+02 3.913213e+01 0.273 ( 2 0) 0.000 ( 152 0)
"CPL:comp_init_cc_esp" - 192 192 1.920000e+02 3.140450e-02 0.000 ( 51 0) 0.000 ( 154 0)
"comp_init_cc_iac" - 192 192 1.920000e+02 2.992821e-02 0.000 ( 111 0) 0.000 ( 146 0)
"CPL:comp_init_cx_all" - 192 192 1.920000e+02 1.050946e+03 10.647 ( 184 0) 3.762 ( 22 0)
"CPL:comp_list_all" - 192 192 1.920000e+02 3.396273e-03 0.001 ( 0 0) 0.000 ( 151 0)
"CPL:init_maps" - 144 144 1.440000e+02 1.711666e+03 11.890 ( 18 0) 11.884 ( 126 0)
"CPL:init_aream" - 144 144 1.440000e+02 3.489308e+01 0.242 ( 138 0) 0.242 ( 61 0)
"CPL:init_domain_check" - 144 144 1.440000e+02 2.021436e+00 0.014 ( 0 0) 0.014 ( 97 0)
"CPL:init_areacor" - 192 192 1.920000e+02 1.042950e+03 14.538 ( 177 0) 2.397 ( 28 0)
"CPL:init_fracs" - 144 144 1.440000e+02 2.069594e+00 0.031 ( 69 0) 0.002 ( 3 0)
"CPL:init_aoflux" - 144 144 1.440000e+02 4.045177e-02 0.001 ( 132 0) 0.000 ( 106 0)