Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

case.st_archive running before actual simulation

sur23beeb

ST
Member
I am in the midst of doing a long restart run on my university cluster. Here are some relevant details:

resolution: TL319_t13
compset: 2000_DATM%JRA_SLND_CICE_POP2_DROF%JRA_SGLC_SWAV
cesmtag: cesm2.1.4-rc.08

Recently, my university computing center started requiring explicit project allocation numbers to submit batch jobs (they were "free" for a short while). I added this line to my env_batch.xml file, with the actual value of $PROJECT itself being set in another xml file.

Code:
<arg flag="--account" name="$PROJECT"/>

When I execute
Code:
./case.submit
, I see two jobs in the queue, one is the actual simulation and the other is the dependent archiving job which is supposed to wait until the simulation is over. This was how things were before I made the above modification. I have made many successful restarts on the same cluster. For some reason, after the modification shown above, the archiving job starts to run before the actual simulation even starts. I am not even sure what is it archiving as the simulation has not started generating any new output. There is even a block of code in my xml file which appears to specify the archiving job depends on case.run or case.test finishing first.

Code:
<group id="case.st_archive">
    <entry id="prereq" value="$DOUT_S">
      <type>char</type>
    </entry>
    <entry id="dependency" value="case.run or case.test">
      <type>char</type>
    </entry>


Any ideas on what might be going on here? Thanks.
 

jedwards

CSEG and Liaisons
Staff member
What is the batch system you are using?
Try command ./preview_run and make sure that you see the dependency requirement listed on the case.st_archive submit line.
If you still don't see what the problem is please post the output of ./preview_run
 

sur23beeb

ST
Member
Thanks for your response. Here is the output from ./preview_run, am showing only the part for the case.st_archive job (emphasis mine):
--------------

FOR JOB: case.st_archive
ENV:
module command is /sw/lmod/lmod/libexec/lmod python purge
module command is /sw/lmod/lmod/libexec/lmod python load intel/2019b CMake/3.15.3 cURL/7.66.0 Python/2.7.16 XML-LibXML/2.0201 impi/2018.5.288 HDF5/1.10.5
Setting Environment OMP_STACKSIZE=256M
Setting Environment I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so
Setting Environment I_MPI_DEBUG=+4
Setting Environment OMP_NUM_THREADS=1

SUBMIT CMD:
sbatch --time 01:30:00 -p short --account <#project> --dependency=afterok:0 case.st_archive --resubmit

----------------------

Should the part after the equals sign for the dependency read differently?
 
Top