Using different queues for case.run and case.st_archive

ceppi · Nov 1, 2021

Dear all,

The system on which I run CESM2.1.3 uses Slurm and has two queues: "standard" and "short", the latter being suited for small jobs shorter than 20 minutes.

1) Is there a way to configure CESM2 (presumably using config_batch.xml) so that any jobs with walltime ≤ 20 minutes are automatically submitted to the short queue, rather than the standard one (without having to manually edit the case.st_archive file)?

2) Additionally, for jobs submitted to the short queue, this system requires an extra sbatch option, "--reservation=shortqos". Again, I wondered if there's way to set up config_batch.xml so that any jobs submitted to the short queue will automatically include that option.

My attempt at making these changes in config_batch.xml is copied below, but I was unsuccessful: short jobs (such as case.st_archive, or the scripts_regression tests) are submitted to the standard queue by default; and if change the $JOB_QUEUE variable in env_workflow.xml to specify the short queue, then the extra sbatch option isn't included and the submission fails.

Any help appreciated – thanks.

XML:

 <batch_system MACH="archer2" type="slurm" >
    <batch_submit>sbatch</batch_submit>
    <submit_args>
      <arg flag="--time" name="$JOB_WALLCLOCK_TIME"/>
      <arg flag="-q" name="$JOB_QUEUE"/>
      <arg flag="--account" name="$PROJECT"/>
    </submit_args>

    <directives queue="standard">
      <directive>--partition=standard</directive>
      <directive>--qos=standard</directive>
      <directive>--export=none</directive>
    </directives>
    <directives queue="short">
      <directive>--partition=standard</directive>
      <directive>--qos=short</directive>
      <directive>--reservation=shortqos</directive>
      <directive>--export=none</directive>
    </directives>

    <queues>
      <queue walltimemax="24:00:00" nodemin="1" nodemax="2712" default="true" >standard</queue>
      <queue walltimemax="00:20:00" nodemin="1" nodemax="4" >short</queue>
    </queues>
  </batch_system>

jedwards · Nov 1, 2021

If you look at cheyenne as an example we do:
<queue walltimemax="12:00:00" nodemin="1" nodemax="4032">regular</queue>
<queue default="true" walltimemax="06:00:00" jobmin="1" jobmax="18">share</queue>
so that jobs using less than 1/2 a node will run on the share queue. You can do this by moving the default="true" from standard to
short and either setting nodemin=5 for standard or nodemax=1 for short.

ceppi · Nov 1, 2021

Thanks for your very quick reply. I tried what you suggested, but now both case.run and case.st_archive are being assigned to the short queue, this despite the case.run job having a walltime of 6h (and the number of nodes requested > nodemax for the short queue).

jedwards · Nov 1, 2021

Sorry that isn't working for you. I guess I don't have an easy solution. I've opened an issue to address this and make it easier

Easier split of case jobs into different queues · Issue #4120 · ESMCI/cime

The system on which I (the user) run CESM2.1.3 (archer2) uses Slurm and has two queues: "standard" and "short", the latter being suited for small jobs shorter than 20 minutes. Is there a way to con...

github.com

ceppi · Nov 1, 2021

Thank you, appreciate it!

m_mineter@ed_ac_uk · Dec 10, 2021

I hope its right to add to this thread, not open a new one.... you will see why I do this
Archer2 has now gone live with the full system: a choice of 3 partitions and 6 queues.
Running jobs - ARCHER2 User Documentation

Running CESM 2.1.3

With .cime/confg_batch.xml specifying the serial queue as default (partition serial) I have the following:

Specify "-q short --machine=archer2" in the create-newcase,
- bad: the .case.run is generated with serial directives (serial is the default queue in .cime/config_batch.xml)
  - the default queue directives are applied to the model and the st_archive job.
- good: the sbatch commands have the right -q options (I don't specify qos in the config_batch.xml)
  - -q short for the model
  - -q serial for the archive job

I'm working around it by modifying .cime/config/cesm/machines/template.case.run so after the CIME-set batchdirectives, these are overridden by SBATCH commands that work for the short and standard queues:

#!/usr/bin/env python
# Batch system directives
{{ batchdirectives }}
#hardcoded in template
#SBATCH --partition=standard
#SBATCH --exclusive
#SBATCH --export=NONE
# qos is set from -q in the create case: standard or long or....

Also in the same directory, I modified
config_batch.xml, to remove the exclusive directive (its not wanted in the serial queue)

I attach what I have for the .cime/config_batch.xml

a) I hope this might be useful to other Archer2 users browsing the forum
b) I ask the CESM specialists to tell me if I am missing something
c) that -q short does not lead to the correct directives seems to be an observation that is closely related to the github issue above.

raeder · Jul 12, 2022

This is also related, but not necessarily an answer to the queries above.
Feel free to move it, or ask me to move it, if it doesn't belong here.

On cheyenne the default queue for case.st_archive appears to be "regular" instead of "share",
which is a waste of 35 processors for the duration of the job.
I often see a dozen st_archive jobs running or waiting in non"share" queues.

I believe that this is happening, at least partly, because in
scripts/lib/CIME/XML/env_batch.py: def select_best_queue
the loop over the queues (listed in config/cesm/machines/config_batch.xml?)
stops as soon as it finds a queue which satisfies the spec.
Maybe it finds a sufficient queue (regular) rather than the best queue (share)
because regular is earlier in the list and is not excluded by queue_meets_spec.
But I haven't been able to parse all of the python and the data flow in the modules.

jedwards · Jul 13, 2022

@raeder I just tried this and see that the case.st_archive job will go to the share queue.
You can check this in a case without submitting using the preview_run tool. For my case I see:

Code:

 ./preview_run
CASE INFO:
  nodes: 1
  total tasks: 36
  tasks per node: 36
  thread count: 1
  ngpus per node: 0

BATCH INFO:
  FOR JOB: case.run
    ENV:
      Setting Environment ESMF_RUNTIME_PROFILE=ON
      Setting Environment ESMF_RUNTIME_PROFILE_OUTPUT=SUMMARY
      Setting Environment MPI_DSM_VERBOSE=true
      Setting Environment MPI_IB_CONGESTED=1
      Setting Environment MPI_TYPE_DEPTH=16
      Setting Environment MPI_USE_ARRAY=None
      Setting Environment OMP_NUM_THREADS=1
      Setting Environment OMP_STACKSIZE=1024M
      Setting Environment OMP_WAIT_POLICY=PASSIVE
      Setting Environment TMPDIR=/glade/scratch/jedwards
      Setting Environment UGCSADDONPATH=/glade/work/turuncu/FV3GFS/addon
      Setting Environment UGCSFIXEDFILEPATH=/glade/work/turuncu/FV3GFS/fix_am
      Setting Environment UGCSINPUTPATH=/glade/work/turuncu/FV3GFS/benchmark-inputs/2012010100/gfs/fcst

    SUBMIT CMD:
      qsub -q regular -l walltime=12:00:00 -A P93300606 -v ARGS_FOR_SCRIPT='--resubmit' .case.run

    MPIRUN (job=case.run):
      mpiexec_mpt -p "%g:"  -np 36  omplace -tm open64 -vv /glade/scratch/jedwards/foofoo/bld/cesm.exe   >> cesm.log.$LID 2>&1

  FOR JOB: case.st_archive
    ENV:
      Setting Environment ESMF_RUNTIME_PROFILE=ON
      Setting Environment ESMF_RUNTIME_PROFILE_OUTPUT=SUMMARY
      Setting Environment MPI_DSM_VERBOSE=true
      Setting Environment MPI_IB_CONGESTED=1
      Setting Environment MPI_TYPE_DEPTH=16
      Setting Environment MPI_USE_ARRAY=None
      Setting Environment OMP_NUM_THREADS=1
      Setting Environment OMP_STACKSIZE=1024M
      Setting Environment OMP_WAIT_POLICY=PASSIVE
      Setting Environment TMPDIR=/glade/scratch/jedwards
      Setting Environment TMPDIR=/glade/scratch/jedwards
      Setting Environment UGCSADDONPATH=/glade/work/turuncu/FV3GFS/addon
      Setting Environment UGCSFIXEDFILEPATH=/glade/work/turuncu/FV3GFS/fix_am
      Setting Environment UGCSINPUTPATH=/glade/work/turuncu/FV3GFS/benchmark-inputs/2012010100/gfs/fcst

    SUBMIT CMD:
      qsub -q share -l walltime=00:20:00 -A P93300606  -W depend=afterok:0 -v ARGS_FOR_SCRIPT='--resubmit' case.st_archive

You can also see this by querying the JOB_QUEUE:

Code:

./xmlquery JOB_QUEUE

Results in group case.run
    JOB_QUEUE: regular

Results in group case.st_archive
    JOB_QUEUE: share

If you are seeing JOB_QUEUE=regular for case.st_archive you might try changing it manually in your case with

Code:

./xmlchange JOB_QUEUE=share --subgroup case.st_archive

Using different queues for case.run and case.st_archive

ceppi

New Member

jedwards

CSEG and Liaisons

ceppi

New Member

jedwards

CSEG and Liaisons

Easier split of case jobs into different queues · Issue #4120 · ESMCI/cime

ceppi

New Member

m_mineter@ed_ac_uk

Member

Attachments

raeder

Member

jedwards

CSEG and Liaisons