Hi,
I am trying to find, for a particular run, the optimal pelayout by following the discussion shown below.
I am running the code as below
"create_test PFS.f09_g17_gl4.F2000climo.PADUM_intel"
the code simulates an test case, setups the case, builds the model but while running it throws an error:
Finished RUN for test PFS.f09_g17_gl4.F2000climo.PADUM_intel in 4.965472 seconds (FAIL). [COMPLETED 1 of 1]
Case dir: /home/cas/phd/asz198070/scratch/cesm2.1.1_out/cases/PFS.f09_g17_gl4.F2000climo.PADUM_intel.20240125_204716_0nslah
Errors were:
b"submit_jobs case.test\nSubmit job case.test\nERROR: Command: 'qsub -v ARGS_FOR_SCRIPT='--skip-preview-namelist' .case.test' failed with error 'b'qsub: directive error: -P Project Name'' from dir '/scratch/cas/phd/asz198070/cesm2.1.1_out/cases/PFS.f09_g17_gl4.F2000climo.PADUM_intel.20240125_204716_0nslah'"
Due to presence of batch system, create_test will exit before tests are complete.
To force create_test to wait for full completion, use --wait
At test-scheduler close, state is:
FAIL PFS.f09_g17_gl4.F2000climo.PADUM_intel (phase RUN)
Case dir: /home/cas/phd/asz198070/scratch/cesm2.1.1_out/cases/PFS.f09_g17_gl4.F2000climo.PADUM_intel.20240125_204716_0nslah
test-scheduler took 308.1711344718933 seconds
The complete run log file is attached here. I have used the -p projectname thinking a valid project name is causing the error, but it throws the same error.
Please help me to get this test case run so that i can optimize the pelayout accordingly
Thanks in advance
Narender
I am trying to find, for a particular run, the optimal pelayout by following the discussion shown below.
I am using the PFS test for the F2000cimo compset, f09_g17_gl4 res machine is PADUM and intel compiler.The default out of the box pelayout is not going to be optimized for your system, you will need to do this.
A good place to start is here:
TOT Run Time: 4020.197 seconds 804.039 seconds/mday 0.29 myears/wday
CPL Run Time: 208.698 seconds 41.740 seconds/mday 5.67 myears/wday
ATM Run Time: 3365.850 seconds 673.170 seconds/mday 0.35 myears/wday
LND Run Time: 186.269 seconds 37.254 seconds/mday 6.35 myears/wday
ICE Run Time: 37.430 seconds 7.486 seconds/mday 31.62 myears/wday
OCN Run Time: 557.270 seconds 111.454 seconds/mday 2.12 myears/wday
ROF Run Time: 18.114 seconds 3.623 seconds/mday 65.34 myears/wday
Lets shoot for a target of 5 ypd. To get that we need to reduce the atm run time by a factor of about 13.
It's currently using 144 tasks, lets increase it to 1824. I usually keep the cpl ntasks the same as atm:
./xmlchange NTASKS_ATM=1824,NTASKS_CPL=1824
Now we need to double the ocn tasks so that it will keep up
./xmlchange NTASKS_OCN=-2 (negative values reflect the number of nodes so this is the same as NTASKS_OCN=96)
and reset the rootpe for the ocn so that it follows the atm tasks
./xmlchange ROOTPE_OCN=1824
finally we change the ICE, LND and ROF tasks to use all available
./xmlchange NTASKS_LND=-19,NTASKS_ROF=-19
./xmlchange ROOTPE_ICE=-19,NTASKS_ICE=-19
There is a test PFS that I use for tuning that you may want to try.
./create_test PFS.f09_g17_gl4.B1850.rockfish_gnu
Also the latest version of the intel compiler is available for download and will give much better performance than gnu.
Once you have done this initial run with the new tuning you can fine tune by balancing the ice and lnd+rof tasks and then
balance atm+ice with the ocn. Once you are completely happy with the performance you can save it to config_pes.xml and make it a default for your system.
I am running the code as below
"create_test PFS.f09_g17_gl4.F2000climo.PADUM_intel"
the code simulates an test case, setups the case, builds the model but while running it throws an error:
Finished RUN for test PFS.f09_g17_gl4.F2000climo.PADUM_intel in 4.965472 seconds (FAIL). [COMPLETED 1 of 1]
Case dir: /home/cas/phd/asz198070/scratch/cesm2.1.1_out/cases/PFS.f09_g17_gl4.F2000climo.PADUM_intel.20240125_204716_0nslah
Errors were:
b"submit_jobs case.test\nSubmit job case.test\nERROR: Command: 'qsub -v ARGS_FOR_SCRIPT='--skip-preview-namelist' .case.test' failed with error 'b'qsub: directive error: -P Project Name'' from dir '/scratch/cas/phd/asz198070/cesm2.1.1_out/cases/PFS.f09_g17_gl4.F2000climo.PADUM_intel.20240125_204716_0nslah'"
Due to presence of batch system, create_test will exit before tests are complete.
To force create_test to wait for full completion, use --wait
At test-scheduler close, state is:
FAIL PFS.f09_g17_gl4.F2000climo.PADUM_intel (phase RUN)
Case dir: /home/cas/phd/asz198070/scratch/cesm2.1.1_out/cases/PFS.f09_g17_gl4.F2000climo.PADUM_intel.20240125_204716_0nslah
test-scheduler took 308.1711344718933 seconds
The complete run log file is attached here. I have used the -p projectname thinking a valid project name is causing the error, but it throws the same error.
Please help me to get this test case run so that i can optimize the pelayout accordingly
Thanks in advance
Narender