Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Model runs successfully but fails in archiving step - CESM2.1.5 Error

kennytam

Kenny Tam
New Member
Hi,

Recently, my team is trying to port over CESM2.1.5 to Rutgers' supercomputer, Amarel. When trying to do a preliminary run to test the model, CESM2.1.5 fails to complete the archiving step. It seems to get stuck during this step and eventually we will terminate it to save resources. I've attached screenshot of CaseStatus, along with the cesm.log file and config_machines.xml. Our preliminary run uses the basic B1850 compset, with resolution f19_g17. No augmentations have been made to the source code. The only xmlchanges I have made in my test runs have been STOP_OPTION=nmonths and STOP_N=1. I believe the installed compiler is GCC 14.2.0.

CESM2_CaseStatus_screenshot.png
 

Attachments

  • config_machines.txt
    4.6 KB · Views: 1

jedwards

CSEG and Liaisons
Staff member
Hi Kenny, Do you get any output in the log associated with jobid 40695916? You can try running case.st_archive from the command line to determine if it's a cesm problem or a problem in your batch system.
 
Vote Upvote 0 Downvote

kennytam

Kenny Tam
New Member
Hi Kenny, Do you get any output in the log associated with jobid 40695916? You can try running case.st_archive from the command line to determine if it's a cesm problem or a problem in your batch system.
Ok by submitting the case.st_archive, it seems to have successfully run. However whenever I try to resubmit the project, it seems to have not correctly copied the information for the rpointers from $DOUT_S_ROOT to $RUNDIR. I thought that creating a clean run might be best, so I made a new case called testrun and still it seemed to have the same issues as before where the rpointers are not being correctly copied. I then proceeded to manually copy the correct rpointers to $RUNDIR, and now I am getting error: ERROR: GETFIL: FAILED to get testrun.cam.h0.0001-01.nc

I am a bit confused on why it needs this file since I thought all of the information for resubmitting should be "testrun.cam.r.0001-02-01-00000.nc". Regardless, I don't know where I can direct the run to access this file "testrun.cam.h0.0001-01.nc". This file does exist as it is in archive/testrun/atm/hist.
 
Vote Upvote 0 Downvote

jedwards

CSEG and Liaisons
Staff member
The history file needs to be copied back to the run directory. It should not have been moved away from there in the first place. The reason it needs the history file is that you are writing a restart before the number of time levels requested for the history file were written.
 
Vote Upvote 0 Downvote
Top