Main menu

Navigation

Porting CESM 2.0 to Blue Waters - Error at subprocess.Popen When Calling qsub

2 posts / 0 new
Last post
rmokos@...
Porting CESM 2.0 to Blue Waters - Error at subprocess.Popen When Calling qsub

Hi,

 

I'm helping a user get CESM 2.0 working on Blue Waters (Cray XE6).

 

We check out the code like this:

 

--------------------
git clone -b release-clm5.0 https://github.com/ESCOMP/ctsm.git clm5.0

cd clm5.0
./manage_externals/checkout_externals
--------------------

 

After fiddling with some parameters and modules, we're able to get it to build, but it fails when submitting the case using qsub.  The subprocess.Popen call in clm5.0/cime/scripts/lib/CIME/utils.py returns nothing in "output," which leads to the following error:

 

--------------------
...
Check case OK
submit_jobs case.run
job is case.run
Submit job case.run
Submitting job script qsub    -q normal -l walltime=24:00:00 -A fyy case.run
ERROR: Couldn't match jobid_pattern '^(\S+)$' within submit output:
 ''
--------------------

 

I tried adding the following logger statement to utils.py to look at the inputs to Popen:

 

--------------------
if (verbose != False and (verbose or logger.isEnabledFor(logging.DEBUG))):

   logger.info("   arg_stdout=%s arg_stderr=%s stdin=%s from_dir=%s env=%s"%(arg_stdout,arg_stderr,stdin,from_dir,env))
--------------------

 

I then ran the submit script with --debug, but it doesn't seem to provide any further insight:

 

--------------------
...
Check case OK
RUN: /mnt/bwpy/single/usr/bin/xmllint --format --output /mnt/a/u/staff/rmokos/tickets/BWAPPS-3553_Cannot_Locate_XML_in_INC_CESM_2.0/clm5.0/cime/scripts/testI/env_run.xml -
  arg_stdout=-1 arg_stderr=-1 stdin=-1 from_dir=None env=None
RUN: /mnt/bwpy/single/usr/bin/xmllint --format --output /mnt/a/u/staff/rmokos/tickets/BWAPPS-3553_Cannot_Locate_XML_in_INC_CESM_2.0/clm5.0/cime/scripts/testI/env_batch.xml -
  arg_stdout=-1 arg_stderr=-1 stdin=-1 from_dir=None env=None
submit_jobs case.run
job is case.run
Submit job case.run
Submitting job script qsub    -q normal -l walltime=24:00:00 -A fyy case.run
RUN: qsub    -q normal -l walltime=24:00:00 -A fyy case.run
  arg_stdout=-1 arg_stderr=-2 stdin=None from_dir=None env=None
> /mnt/a/u/staff/rmokos/tickets/BWAPPS-3553_Cannot_Locate_XML_in_INC_CESM_2.0/clm5.0/cime/scripts/lib/CIME/utils.py(49)expect()
-> raise exc_type("{} {}".format(error_prefix, error_msg))
(Pdb)
--------------------


The user created a test script to try to replicate the issue with the execution of the same qsub command, but instead of failing like the CESM code, it works:

 

test script:

 

--------------------
import os, sys
import subprocess

sys.path.append(os.path.abspath("/u/staff/rmokos/tickets/BWAPPS-3553_Cannot_Locate_XML_in_INC_CESM_2.0/clm5.0/cime/scripts/lib"))
from CIME.utils import run_cmd_no_fail

cmd='qsub    -q normal -l walltime=24:00:00 -A fyy case.run'
arg_stdout=subprocess.PIPE
arg_stderr=subprocess.STDOUT
stdin=None
from_dir=None
env=None
input_str=None

print 'arg_stdout=%s arg_stderr=%s stdin=%s from_dir=%s env=%s'%(arg_stdout,arg_stderr,stdin,from_dir,env)

print '*****direct call subprocess.Popen******'
proc= subprocess.Popen(cmd, shell=True, stdout=arg_stdout, stderr=arg_stderr, stdin=stdin, cwd=from_dir, env=env)
output, errput = proc.communicate(input_str)
output = output.strip() if output is not None else output
errput = errput.strip() if errput is not None else errput
stat = proc.wait()
print 'output=%s'%output
print 'errput=%s'%errput
print 'stat=%s'%stat

print '*****call CIME.utils.run_cmd_no_fail******'
output = run_cmd_no_fail(cmd, combine_output=True)
print 'output=%s'%output
--------------------

 

output:

 

--------------------
> python test.py
arg_stdout=-1 arg_stderr=-2 stdin=None from_dir=None env=None
*****direct call subprocess.Popen******
output=INFO: The qsub '-V' option is deprecated. Please include your environment variables directly in your job script.
INFO: The qsub '-V' option is deprecated. Please include your environment variables directly in your job script.
INFO: The qsub '-V' option is deprecated. Please include your environment variables directly in your job script.

WARNING: Job script does not invoke any 'aprun'/'ccmrun' command.
Job will be submitted as usual, but please ensure your job script eventually
invokes 'aprun'/'ccmrun' command to execute tasks on allocated compute nodes.
Please contact help+bw@ncsa.illinois.edu if you need any assistance.

INFO: Job submitted to account: fyy
8612426.bw
errput=None
stat=0
*****call CIME.utils.run_cmd_no_fail******
output=INFO: The qsub '-V' option is deprecated. Please include your environment variables directly in your job script.
INFO: The qsub '-V' option is deprecated. Please include your environment variables directly in your job script.
INFO: The qsub '-V' option is deprecated. Please include your environment variables directly in your job script.

WARNING: Job script does not invoke any 'aprun'/'ccmrun' command.
Job will be submitted as usual, but please ensure your job script eventually
invokes 'aprun'/'ccmrun' command to execute tasks on allocated compute nodes.
Please contact help+bw@ncsa.illinois.edu if you need any assistance.


INFO: Job submitted to account: fyy
8612427.bw
--------------------

 

2 jobs are submitted, which is what should happen.

 

We're using python 2.7.14.  Also note that Blue Waters has a special python environment that is used (details are here if you care: https://bluewaters.ncsa.illinois.edu/python).  Because the version of perl in the bwpy module needs updating, we use the system default.  As a result of these things, the following commands are issued to enter into the python environment before building and submitting the case:

 

--------------------
module load bwpy/0.3.2
module load /u/staff/rmokos/tickets/BWAPPS-3553_Cannot_Locate_XML_in_INC_CESM_2.0/CESM-ENV  # sets CESMDATAROOT
export PERL5LIB=/usr/lib/perl5/vendor_perl/5.10.0/x86_64-linux-thread-multi:$PERL5LIB
export PATH=~/bin:$PATH
bwpy-environ
--------------------

 

The same is also done before running the test script.

 

Any help would be appreciated.

 

Ryan


rmokos@...

For the solution to this problem, see Jim's response to github issue 2612 here: https://github.com/ESMCI/cime/issues/2612

Log in or register to post comments

Who's new

  • kamal.tewari1@...
  • rchemke
  • abdulla.sakalli@...
  • mehmetugurgucel@...
  • borst