Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Exception from case_run

Ruth

wenru xu
New Member
Hello,

I got an error when I submit the case ( ./create_newcase --case SSP585 --res f09_g16 --compset ISSP585Clm50BgcCrop --run-unsupported --compiler gnu):
Exception from case_run: ERROR: RUN FAIL: Command 'mpirun -np 8 /p/scratch/CESMDATAROOT/CaseOutputs/SSP5851/bld/cesm.exe >> cesm.log.$LID 2>&1 ' failed
See log file for details: /p/scratch/CESMDATAROOT/CaseOutputs/SSP5851/run/cesm.log.3719539.220803-133904 (the log file was in the attachment). The warning in the log file is:
Warning: There was an error initializing an OpenFabrics device.
User-specified PIO rearranger comm max pend req (comp2io), 0 (value will be reset as requested)
Resetting PIO rearranger comm max pend req (comp2io) to 64

Primary job terminated normally, but 1 process returned

a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------

mpirun noticed that process rank 0 with PID 2735 on node nova004 exited on signal 9 (Killed).


However, when I run cases like ./create_newcase --case I2000CLM509125 --res f09_g16 --compset I2000Clm50SpGs --run-unsupported --compiler gnu, although the cesm.log file has the same warning, it does run (no "Exception from case_run" error during running).


Thanks!
 

Attachments

  • cesm.log.3719539.220803-133904.txt
    16.2 KB · Views: 9

jedwards

CSEG and Liaisons
Staff member
This message: "exited on signal 9 (Killed)." at the end of the log indicates that the job was killed by a system process or
possibly by a user. Check with your system administrator.
 

Ruth

wenru xu
New Member
This message: "exited on signal 9 (Killed)." at the end of the log indicates that the job was killed by a system process or
possibly by a user. Check with your system administrator.
Thanks for your reply. I figured it out. It is caused by out-of-memory.
 
Top