resubmit issue

minimax

minimax
New Member
Dear colleagues,

i am facing the following problem with cesm2.2.2 (compset f2000climo with modified sea-ice file)-

./xmlchange SSTICE_DATA_FILENAME="$CASEDIR/sst_HadOIBl_bc_1.9x2.5_2000climo_noice.nc"
./xmlchange STOP_N=10,STOP_OPTION=nyears
./xmlchange RESUBMIT=2

when i run the model without modified sea-ice file, it runs fully (2 times resubmit) and finishes correctly.
when i run with modified sea-ice file, it does only 10 years and finishes without creating archive folder to store generated files.
I didn't find any errors in the logs. It also creates all the restart files. After that I can restart the run again without errors. It is also strange that when I run the model for 5 days, it works also correctly and creates all files (./xmlchange STOP_N=1,STOP_OPTION=days ./xmlchange RESUBMIT=4)

How to solve this issue?

thank you!
 
Solution
Hi there,

Usually when the resubmit fails to happen, it's because the model didn't actually finish successfully. Looking at the CaseStatus file here, it looks like you never got a "model execution success" message for the last run, so that adds evidence to this theory. There are reasons that the model fails that don't leave errors in the logs, and the most common one is because it took more time than was requested and the machine killed the process. In your case directory, there should be a "run.[casename].o[projectnumber]" file with the output from the super computer. Check in there to see if the simulation finished successfully or was stopped due to running out of wall-clock time.

minimax

minimax
New Member
CaseStatus file -
2025-03-15 18:48:20: xmlchange success <command> ./xmlchange SSTICE_DATA_FILENAME=/home/luna/cesm/cases/no_ice/sst_HadOIBl_bc_1.9x2.5_2000climo_noice.nc </command>
---------------------------------------------------
2025-03-15 18:48:20: xmlchange success <command> ./xmlchange NTASKS=378 </command>
---------------------------------------------------
2025-03-15 18:48:21: xmlchange success <command> ./xmlchange PIO_TYPENAME=netcdf </command>
---------------------------------------------------
2025-03-15 18:48:21: xmlchange success <command> ./xmlchange DOUT_S=TRUE </command>
---------------------------------------------------
2025-03-15 18:48:21: xmlchange success <command> ./xmlchange STOP_N=10,STOP_OPTION=nyears </command>
---------------------------------------------------
2025-03-15 18:48:21: xmlchange success <command> ./xmlchange RESUBMIT=2 </command>
---------------------------------------------------
2025-03-15 18:48:21: case.setup starting
---------------------------------------------------
2025-03-15 18:48:25: case.setup success
---------------------------------------------------
2025-03-15 18:48:28: case.build starting
---------------------------------------------------
2025-03-15 18:53:01: case.build success
---------------------------------------------------
2025-03-15 18:53:06: case.submit starting
---------------------------------------------------
2025-03-15 18:53:18: case.submit success case.run:12873, case.st_archive:12874
---------------------------------------------------
2025-03-15 18:53:20: case.run starting
---------------------------------------------------
2025-03-15 18:53:35: model execution starting
---------------------------------------------------
2025-03-16 14:06:38: model execution success
---------------------------------------------------
2025-03-16 14:06:38: case.run success
---------------------------------------------------
2025-03-16 21:45:56: case.submit starting
---------------------------------------------------
2025-03-16 21:46:08: case.submit success case.run:12894, case.st_archive:12895
---------------------------------------------------
2025-03-16 21:46:09: case.run starting
---------------------------------------------------
2025-03-16 21:46:20: model execution starting
---------------------------------------------------
 
Vote Upvote 0 Downvote

katec

CSEG and Liaisons
Staff member
Hi there,

Usually when the resubmit fails to happen, it's because the model didn't actually finish successfully. Looking at the CaseStatus file here, it looks like you never got a "model execution success" message for the last run, so that adds evidence to this theory. There are reasons that the model fails that don't leave errors in the logs, and the most common one is because it took more time than was requested and the machine killed the process. In your case directory, there should be a "run.[casename].o[projectnumber]" file with the output from the super computer. Check in there to see if the simulation finished successfully or was stopped due to running out of wall-clock time.
 
Vote Upvote 1 Downvote
Solution
Back
Top