Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Failed to run test case (./case.submit error)

Lakhima

Lakhima Chutia
New Member
Hello,

I am a new CESM user. I am trying to run cesm2.2.0 on a new system “argon” with compset B1850 and res f19_g17. I have modified and validated config_compilers.xml, config_machines.xml, and config_batch.xml (attached below). Model executable (cesm.exe) is successfully built, but I am getting errors in the case.submit.
Log files, the output of preview_run, and a screenshot of case.submit error are also attached herewith.

Any help would be greatly appreciated.

Thanks!
Lakhima
 

Attachments

  • config.zip
    27.3 KB · Views: 4
  • log.zip
    68.9 KB · Views: 4
  • Screen Shot-case.submit.png
    Screen Shot-case.submit.png
    999.6 KB · Views: 29
  • Screen Shot-preview_run.png
    Screen Shot-preview_run.png
    800.7 KB · Views: 29

jedwards

CSEG and Liaisons
Staff member
This message indicates that the model was killed by an external process, perhaps it ran out of memory?
In config_machines.xml you have <BATCH_SYSTEM>none</BATCH_SYSTEM> indicating that this job should not be submitted and instead should be
run interactively. I don't think that that's what you intend. I think it should be pbs, you also should change this in config_batch.xml.

> mpirun noticed that process rank 1 with PID 0 on node argon-login-1 exited on signal 9 (Killed).
 

Lakhima

Lakhima Chutia
New Member
Thanks for your quick response.
I have a query regarding the batch system.
Our HPC cluster system uses the Sun Grid Engine (SGE) queue scheduler system.
When I use SGE there is an error in creating the case stating no batch system “sge” found (screenshot attached below).
Could you please suggest me how to proceed?
 

Attachments

  • Screen Shot 2022-11-01 at 10.10.40 PM.png
    Screen Shot 2022-11-01 at 10.10.40 PM.png
    647.3 KB · Views: 27

Lakhima

Lakhima Chutia
New Member
Thanks for your suggestion.
I tried that and got the following error message....

ERROR: Did not find sge in valid values for BATCH_SYSTEM: ['nersc_slurm', 'lc_slurm', 'moab', 'pbs', 'lsf', 'slurm', 'cobalt', 'cobalt_theta', 'none'].

Please suggest. config_batch file and screenshot of the error are attached below.
 

Attachments

  • Archive.zip
    620.5 KB · Views: 4

jedwards

CSEG and Liaisons
Staff member
In the file components/cpl7/driver/cime_config/config_component.xml
and/or components/cmeps/cime_config/config_component.xml
Find the variable BATCH_SYSTEM and add 'sge' to the list of valid values.
 

Lakhima

Lakhima Chutia
New Member
Thanks a lot! I am still getting the same error after adding "sge" to the list of valid values in the config_component.xml.
Did I miss something?
 

jedwards

CSEG and Liaisons
Staff member
In the new case do
./xmlquery --full BATCH_SYSTEM

you should see 'sge' in the list of valid_values.
If you don't then try ./xmlquery COMP_INTERFACE
if the answer is "nuopc" check file components/cmeps/cime_config/config_component.xml
if the answer is "mct" check file components/cpl7/driver/cime_config/config_component.xml
 

Lakhima

Lakhima Chutia
New Member
I am getting the error while creating the case

Below is the error

./create_newcase --case cases/test --res f19_g17 --compset B1850

-------
Batch_system_type is sge
ERROR: Did not find sge in valid values for BATCH_SYSTEM: ['nersc_slurm', 'lc_slurm', 'moab', 'pbs', 'lsf', 'slurm', 'cobalt', 'cobalt_theta', 'none']
 

Lakhima

Lakhima Chutia
New Member
In the new case do
./xmlquery --full BATCH_SYSTEM

you should see 'sge' in the list of valid_values.
If you don't then try ./xmlquery COMP_INTERFACE
if the answer is "nuopc" check file components/cmeps/cime_config/config_component.xml
if the answer is "mct" check file components/cpl7/driver/cime_config/config_component.xml
This I can check after creating the new case?
Sorry about the confusion.

Now the BATCH_SYSTEM valid_values in config_component.xml are

<entry id="BATCH_SYSTEM">
<type>char</type>
<default_value>none</default_value>
<valid_values>nersc_slurm,lc_slurm,moab,pbs,lsf,slurm,cobalt,cobalt_theta,none,sge</valid_values>
<group>config_batch</group>
<file>env_batch.xml</file>
<desc>The batch system type to use for this machine.</desc>
</entry>
 

Lakhima

Lakhima Chutia
New Member
Thanks for your help. I added "sge" to the list of valid values for the batch system following your instructions.
Now I am getting errors in the case.submit. Could you please have a look at the output of the preview run and the log file and suggest?

The output of the preview_run is below

CASE INFO:

nodes: 6
total tasks: 36
tasks per node: 6
thread count: 1

BATCH INFO:

FOR JOB: case.run
ENV:
module command is /usr/share/lmod/lmod/libexec/lmod python load intel/2017.1

Setting Environment KMP_STACKSIZE=256M
Setting Environment KMP_STACKSIZE=256M
Setting Environment OMP_NUM_THREADS=1

SUBMIT CMD:
qsub -q CGRER -l h_rt=48:00:00 -v ARGS_FOR_SCRIPT='--resubmit' .case.run

MPIRUN (job=case.run):
mpirun -np 36 /Dedicated/jwang-data2/Lakhima/CESM-NEW/RUN-SGE-TRIAL/b.e20.B1850.f19_g17.test/bld/cesm.exe >> cesm.log.$LID 2>&1


FOR JOB: case.st_archive

ENV:
module command is /usr/share/lmod/lmod/libexec/lmod python load intel/2017.1

Setting Environment KMP_STACKSIZE=256M
Setting Environment KMP_STACKSIZE=256M
Setting Environment OMP_NUM_THREADS=1

SUBMIT CMD:
qsub -q CGRER -l h_rt=00:20:00 -hold_jid 0 -v ARGS_FOR_SCRIPT='--resubmit' case.st_archive
 

Attachments

  • Archive.zip
    29.1 KB · Views: 2

jedwards

CSEG and Liaisons
Staff member
I don't know or have access to the sge batch system. You also have a number of errors loading modules.
You should consider consulting with a system administrator of your system.
 
Top