Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

For through put optimisation

knreddy

K Narender Reddy
Member
Hi,
I am trying to find, for a particular run, the optimal pelayout by following the discussion shown below.
The default out of the box pelayout is not going to be optimized for your system, you will need to do this.
A good place to start is here:
TOT Run Time: 4020.197 seconds 804.039 seconds/mday 0.29 myears/wday
CPL Run Time: 208.698 seconds 41.740 seconds/mday 5.67 myears/wday
ATM Run Time: 3365.850 seconds 673.170 seconds/mday 0.35 myears/wday
LND Run Time: 186.269 seconds 37.254 seconds/mday 6.35 myears/wday
ICE Run Time: 37.430 seconds 7.486 seconds/mday 31.62 myears/wday
OCN Run Time: 557.270 seconds 111.454 seconds/mday 2.12 myears/wday
ROF Run Time: 18.114 seconds 3.623 seconds/mday 65.34 myears/wday

Lets shoot for a target of 5 ypd. To get that we need to reduce the atm run time by a factor of about 13.
It's currently using 144 tasks, lets increase it to 1824. I usually keep the cpl ntasks the same as atm:
./xmlchange NTASKS_ATM=1824,NTASKS_CPL=1824
Now we need to double the ocn tasks so that it will keep up
./xmlchange NTASKS_OCN=-2 (negative values reflect the number of nodes so this is the same as NTASKS_OCN=96)
and reset the rootpe for the ocn so that it follows the atm tasks
./xmlchange ROOTPE_OCN=1824
finally we change the ICE, LND and ROF tasks to use all available
./xmlchange NTASKS_LND=-19,NTASKS_ROF=-19
./xmlchange ROOTPE_ICE=-19,NTASKS_ICE=-19

There is a test PFS that I use for tuning that you may want to try.
./create_test PFS.f09_g17_gl4.B1850.rockfish_gnu

Also the latest version of the intel compiler is available for download and will give much better performance than gnu.

Once you have done this initial run with the new tuning you can fine tune by balancing the ice and lnd+rof tasks and then
balance atm+ice with the ocn. Once you are completely happy with the performance you can save it to config_pes.xml and make it a default for your system.
I am using the PFS test for the F2000cimo compset, f09_g17_gl4 res machine is PADUM and intel compiler.

I am running the code as below
"create_test PFS.f09_g17_gl4.F2000climo.PADUM_intel"

the code simulates an test case, setups the case, builds the model but while running it throws an error:
Finished RUN for test PFS.f09_g17_gl4.F2000climo.PADUM_intel in 4.965472 seconds (FAIL). [COMPLETED 1 of 1]
Case dir: /home/cas/phd/asz198070/scratch/cesm2.1.1_out/cases/PFS.f09_g17_gl4.F2000climo.PADUM_intel.20240125_204716_0nslah
Errors were:
b"submit_jobs case.test\nSubmit job case.test\nERROR: Command: 'qsub -v ARGS_FOR_SCRIPT='--skip-preview-namelist' .case.test' failed with error 'b'qsub: directive error: -P Project Name'' from dir '/scratch/cas/phd/asz198070/cesm2.1.1_out/cases/PFS.f09_g17_gl4.F2000climo.PADUM_intel.20240125_204716_0nslah'"

Due to presence of batch system, create_test will exit before tests are complete.
To force create_test to wait for full completion, use --wait
At test-scheduler close, state is:
FAIL PFS.f09_g17_gl4.F2000climo.PADUM_intel (phase RUN)
Case dir: /home/cas/phd/asz198070/scratch/cesm2.1.1_out/cases/PFS.f09_g17_gl4.F2000climo.PADUM_intel.20240125_204716_0nslah
test-scheduler took 308.1711344718933 seconds

The complete run log file is attached here. I have used the -p projectname thinking a valid project name is causing the error, but it throws the same error.
Please help me to get this test case run so that i can optimize the pelayout accordingly

Thanks in advance
Narender
 

Attachments

  • test.log.txt
    2.2 KB · Views: 2

jedwards

CSEG and Liaisons
Staff member
The config_machines.xml definition for your machine expects a project number to be set in the environment, it's failing because that number doesn't exist.
But the flag for pbs is -A not -p as you have. It seems like you must not have run any cesm cases on this system, I recommend following the porting procedure rather than jumping directly into running a case. We also strongly recommend updating to version 2.1.5, the latest release in the 2.1.x series.
 

knreddy

K Narender Reddy
Member
I'm afraid that's not the case. I have run a lot of cesm runs, but I set the project separately in the .case.run file. Here I have no way of setting up the project. And -A projectname also doesn't work.
 

jedwards

CSEG and Liaisons
Staff member
Okay you are setting the project manually in .case.run and you shouldn't do that. You can set an env variable $PROJECT and if your
config_machines.xml and config_batch.xml are configured properly it will use that in both .case.run and (as in this case) .case.test.
 
Top