Failed to run test case (./case.submit error)

Lakhima

Lakhima Chutia
New Member
Hello,

I am a new CESM user. I am trying to run cesm2.2.0 on a new system “argon” with compset B1850 and res f19_g17. I have modified and validated config_compilers.xml, config_machines.xml, and config_batch.xml (attached below). Model executable (cesm.exe) is successfully built, but I am getting errors in the case.submit.
Log files, the output of preview_run, and a screenshot of case.submit error are also attached herewith.

Any help would be greatly appreciated.

Thanks!
Lakhima
 

Attachments

  • config.zip
    config.zip
    27.3 KB · Views: 5
  • log.zip
    log.zip
    68.9 KB · Views: 5
  • Screen Shot-case.submit.png
    Screen Shot-case.submit.png
    999.6 KB · Views: 38
  • Screen Shot-preview_run.png
    Screen Shot-preview_run.png
    800.7 KB · Views: 38

jedwards

CSEG and Liaisons
Staff member
This message indicates that the model was killed by an external process, perhaps it ran out of memory?
In config_machines.xml you have <BATCH_SYSTEM>none</BATCH_SYSTEM> indicating that this job should not be submitted and instead should be
run interactively. I don't think that that's what you intend. I think it should be pbs, you also should change this in config_batch.xml.

> mpirun noticed that process rank 1 with PID 0 on node argon-login-1 exited on signal 9 (Killed).
 

Lakhima

Lakhima Chutia
New Member
Thanks for your quick response.
I have a query regarding the batch system.
Our HPC cluster system uses the Sun Grid Engine (SGE) queue scheduler system.
When I use SGE there is an error in creating the case stating no batch system “sge” found (screenshot attached below).
Could you please suggest me how to proceed?
 

Attachments

  • Screen Shot 2022-11-01 at 10.10.40 PM.png
    Screen Shot 2022-11-01 at 10.10.40 PM.png
    647.3 KB · Views: 36

Lakhima

Lakhima Chutia
New Member
Thanks for your suggestion.
I tried that and got the following error message....

ERROR: Did not find sge in valid values for BATCH_SYSTEM: ['nersc_slurm', 'lc_slurm', 'moab', 'pbs', 'lsf', 'slurm', 'cobalt', 'cobalt_theta', 'none'].

Please suggest. config_batch file and screenshot of the error are attached below.
 

Attachments

jedwards

CSEG and Liaisons
Staff member
In the file components/cpl7/driver/cime_config/config_component.xml
and/or components/cmeps/cime_config/config_component.xml
Find the variable BATCH_SYSTEM and add 'sge' to the list of valid values.
 

Lakhima

Lakhima Chutia
New Member
Thanks a lot! I am still getting the same error after adding "sge" to the list of valid values in the config_component.xml.
Did I miss something?
 

jedwards

CSEG and Liaisons
Staff member
In the new case do
./xmlquery --full BATCH_SYSTEM

you should see 'sge' in the list of valid_values.
If you don't then try ./xmlquery COMP_INTERFACE
if the answer is "nuopc" check file components/cmeps/cime_config/config_component.xml
if the answer is "mct" check file components/cpl7/driver/cime_config/config_component.xml
 

Lakhima

Lakhima Chutia
New Member
I am getting the error while creating the case

Below is the error

./create_newcase --case cases/test --res f19_g17 --compset B1850

-------
Batch_system_type is sge
ERROR: Did not find sge in valid values for BATCH_SYSTEM: ['nersc_slurm', 'lc_slurm', 'moab', 'pbs', 'lsf', 'slurm', 'cobalt', 'cobalt_theta', 'none']
 

Lakhima

Lakhima Chutia
New Member
In the new case do
./xmlquery --full BATCH_SYSTEM

you should see 'sge' in the list of valid_values.
If you don't then try ./xmlquery COMP_INTERFACE
if the answer is "nuopc" check file components/cmeps/cime_config/config_component.xml
if the answer is "mct" check file components/cpl7/driver/cime_config/config_component.xml
This I can check after creating the new case?
Sorry about the confusion.

Now the BATCH_SYSTEM valid_values in config_component.xml are

<entry id="BATCH_SYSTEM">
<type>char</type>
<default_value>none</default_value>
<valid_values>nersc_slurm,lc_slurm,moab,pbs,lsf,slurm,cobalt,cobalt_theta,none,sge</valid_values>
<group>config_batch</group>
<file>env_batch.xml</file>
<desc>The batch system type to use for this machine.</desc>
</entry>
 

Lakhima

Lakhima Chutia
New Member
Thanks for your help. I added "sge" to the list of valid values for the batch system following your instructions.
Now I am getting errors in the case.submit. Could you please have a look at the output of the preview run and the log file and suggest?

The output of the preview_run is below

CASE INFO:

nodes: 6
total tasks: 36
tasks per node: 6
thread count: 1

BATCH INFO:

FOR JOB: case.run
ENV:
module command is /usr/share/lmod/lmod/libexec/lmod python load intel/2017.1

Setting Environment KMP_STACKSIZE=256M
Setting Environment KMP_STACKSIZE=256M
Setting Environment OMP_NUM_THREADS=1

SUBMIT CMD:
qsub -q CGRER -l h_rt=48:00:00 -v ARGS_FOR_SCRIPT='--resubmit' .case.run

MPIRUN (job=case.run):
mpirun -np 36 /Dedicated/jwang-data2/Lakhima/CESM-NEW/RUN-SGE-TRIAL/b.e20.B1850.f19_g17.test/bld/cesm.exe >> cesm.log.$LID 2>&1


FOR JOB: case.st_archive

ENV:
module command is /usr/share/lmod/lmod/libexec/lmod python load intel/2017.1

Setting Environment KMP_STACKSIZE=256M
Setting Environment KMP_STACKSIZE=256M
Setting Environment OMP_NUM_THREADS=1

SUBMIT CMD:
qsub -q CGRER -l h_rt=00:20:00 -hold_jid 0 -v ARGS_FOR_SCRIPT='--resubmit' case.st_archive
 

Attachments

jedwards

CSEG and Liaisons
Staff member
I don't know or have access to the sge batch system. You also have a number of errors loading modules.
You should consider consulting with a system administrator of your system.
 
Back
Top