Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

How to submit the job to batch system in clm5.0 ?

Dear all. I have problem during submit the job to batch system in clm5.0. When i type ./case.submit, there are no cesm.logxxxx in $RUNDIR and cesm.stdoutxxxx in $CASEROOT. However, when i directly type bsub -W 23:59 < .case.run, cesm.logxxx and cesm.stdoutxxx can be generated. I guess there are some wrong setting in config_batch.xml or other files, but i cannot find the exact reason. Can anybody give me some suggesion? Really appreciate that !

The followings are some information:

./create_newcase --case test_case2 --res f19_g16 --compset X -mach cern --run-unsupported

./preview_run

CASE INFO:
nodes: 13
total tasks: 312
tasks per node: 24
thread count: 1

BATCH INFO:
FOR JOB: case.run
ENV:
Setting Environment OMP_STACKSIZE=256M
Setting Environment OMP_NUM_THREADS=1

SUBMIT CMD:
bsub "all, ARGS_FOR_SCRIPT=--resubmit" < .case.run

MPIRUN (job=case.run):
mpijob.intelmpi /work2/cern1426/clm5/test_case2/bld/cesm.exe >> cesm.log.$LID 2>&1

FOR JOB: case.st_archive
ENV:
Setting Environment OMP_STACKSIZE=256M
Setting Environment OMP_NUM_THREADS=1

SUBMIT CMD:
bsub -w 'done(0)' "all, ARGS_FOR_SCRIPT=--resubmit" < case.st_archive

Config_batch.xml

<batch_system type="lsf">
<batch_query args=" -w" >bjobs</batch_query>
<batch_submit>bsub</batch_submit>
<batch_cancel>bkill</batch_cancel>
<batch_redirect>&lt;</batch_redirect>
<batch_env> </batch_env>
<batch_directive>#BSUB</batch_directive>
<jobid_pattern>&lt;(\d+)&gt;</jobid_pattern>
<depend_string> -w 'done(jobid)'</depend_string>
<depend_allow_string> -w 'ended(jobid)'</depend_allow_string>
<depend_separator>&amp;&amp;</depend_separator>
<directives>
<directive > -J {{ job_id }} </directive>
<directive > -n {{ total_tasks }} </directive>
<directive > -W $JOB_WALLCLOCK_TIME </directive>
<directive default="cesm.stdout" > -o {{ job_id }}.%J </directive>
<directive default="cesm.stderr" > -e {{ job_id }}.%J </directive>
<directive > -R "span[ptile={{ tasks_per_node }}]"</directive>
</directives>
<queues>
<queue walltimemax="23:59">cpuII</queue>
</queues>

</batch_system>
 
The following is main batch parameters in case.run of CLM4.5, which can be submitted to queue and run normally.

#BSUB -W 1440
#BSUB -n 312
#BSUB -R "span[ptile=24]"
#BSUB -J CN-china_128x72_I20TRCRUCLM45BGC
#BSUB -q "cpuII"
#BSUB -o %J.out
#BSUB -e %J.err

mpijob.intelmpi $EXEROOT/cesm.exe >&! cesm.log.$LID
 

jedwards

CSEG and Liaisons
Staff member
We do not have access to an lsf system and so I am guessing a little, but try adding the following to config_batch.xml
```
<submit_args>
<arg flag="-q" name="$JOB_QUEUE"/>
<arg flag="-W" name="$JOB_WALLCLOCK_TIME"/>
<arg flag="-P" name="$PROJECT"/>
</submit_args>
```
 
Thanks for the reply. I will try it. Moreover, I have another question about how to remove the job of case.st_archive and only submit case.run to queue.
 
Thanks. I think I kind of know how to set the batch. After modifying the env_batch.xml and invoking preview_run, the screen shows:

CASE INFO:
nodes: 13
total tasks: 312
tasks per node: 24
thread count: 1

BATCH INFO:
FOR JOB: case.run
ENV:
Setting Environment OMP_STACKSIZE=256M
Setting Environment OMP_NUM_THREADS=1

SUBMIT CMD:
bsub -q cpuII -W 23:59 "all, ARGS_FOR_SCRIPT=--resubmit" < .case.run

MPIRUN (job=case.run):
mpijob.intelmpi /work2/cern1426/clm5/test_case3/bld/cesm.exe >> cesm.log.$LID 2>&1

One more question, I want to know if I want to resubmit the job for many times, is this batch setting right?
 

jedwards

CSEG and Liaisons
Staff member
It looks right, just set RESUBMIT to the number of times you want to resubmit.
All of these issues are addressed by using the cime/scripts/tests/scripts_regression_tests.py
 
Hi,jedwards. It still didn't work after modifying the env_batch.xml. I found the reason may be wrong parameter format in the following command:

bsub -q cpuII -W 23:59 "all, ARGS_FOR_SCRIPT=--resubmit" < .case.run

When I invoke the above command directly, errors will occur. However, when I changed the command to

bsub -q cpuII -W 23:59 --resubmit < .case.run

The job can be submitted to the queue and std.outxxx log can be generated.

So, I guess that "all, ARGS_FOR_SCRIPT=--resubmit" may be wrong in format. But i have no idea about how to change it in case.submit or other codes. So, can you give me some suggestion? Thanks a lot in advance.
 

jedwards

CSEG and Liaisons
Staff member
This is hardcoded in cime/scripts/lib/CIME/XML/env_batch.py at line 591, try commenting out the section specific to lsf:
Code:
       elif batch_system == "lsf":                                                                                                               
            return "{} \"all, ARGS_FOR_SCRIPT={}\"".format(batch_env_flag, run_args_str)
`
 
Thanks, jedwards. It works now. However, the std.err showed

Code:
 check for resubmit

and std.out showed
Code:
2020-06-19 23:06:30 MODEL EXECUTION BEGINS HERE
run command is mpijob.intelmpi  /work2/cern1426/clm5/test_case3/bld/cesm.exe  >> cesm.log.$LID 2>&1
2020-06-19 23:07:46 MODEL EXECUTION HAS FINISHED

Is this ok for successful run ? By the way, the RESUBMIT=0 in env_run.xml
 
Top