We're trying to port CESM2 to our local HPC system, which is running SGE.
We have added an entry to the config_batch.xml file:
Unfortunately, this fails when running 'case.submit' with the following error:
I think what we need to do is add a conditional to the '-pe smp.pe' directive, so that it is only used when the number of tasks is greater than 1 (as an aside, can someone tell me of a better task indicator to use than 'tasks_per_node', as this is going to fail us when we use more than one node?). But I can't work out how I could add such a conditional.
What can we do to fix this problem? And should this fix be in the config_batch.xml file, or another configuration file?
We have added an entry to the config_batch.xml file:
<batch_system type="sge" >
<batch_query args="-j">qstat</batch_query>
<batch_submit>qsub </batch_submit>
<batch_cancel>qdel</batch_cancel>
<batch_env>-v</batch_env>
<batch_directive>#$ </batch_directive>
<jobid_pattern>(\d+)</jobid_pattern>
<depend_string> -hold_jid jobid</depend_string>
<depend_separator> , </depend_separator>
<walltime_format>%H:%M:%S</walltime_format>
<batch_mail_flag>-M</batch_mail_flag>
<batch_mail_type_flag>-m</batch_mail_type_flag>
<batch_mail_type>, bea, b, e, a, n, bes</batch_mail_type>
<submit_args>
<arg flag="-q" name="$JOB_QUEUE"/>
<arg flag="-P" name="$PROJECT"/>
<arg flag="-l h_rt=" name="$JOB_WALLCLOCK_TIME"/>
</submit_args>
<directives>
<directive> -N {{ job_id }}</directive>
<directive> -V </directive>
<directive> -pe smp.pe {{ tasks_per_node }} </directive>
</directives>
</batch_system>
<batch_system MACH="csf3" type="sge">
<queues>
<queue walltimemax="01:00:00" nodemax="1" default="true">short</queue>
</queues>
</batch_system>
Unfortunately, this fails when running 'case.submit' with the following error:
ERROR: Command: 'qsub -1 short -] h_rt=0:20:00 -hold_jid 4512947 -v ARGS_FOR_SCRIPT='--resubmit' case.st_archive' failed with error
Unable to run job: Parallel job within smp.pe: number of slots must be at least 2.
Job has been rejected.
I think what we need to do is add a conditional to the '-pe smp.pe' directive, so that it is only used when the number of tasks is greater than 1 (as an aside, can someone tell me of a better task indicator to use than 'tasks_per_node', as this is going to fail us when we use more than one node?). But I can't work out how I could add such a conditional.
What can we do to fix this problem? And should this fix be in the config_batch.xml file, or another configuration file?