srogstad@geo_umass_edu
Member
One of my simulations seems to have stopped partway through last night, but I don't see any particularly errors in the logs that would suggest a crash. It runs one year at a time and in the run directory I have history files for October (2122-10) and restarts for November (2022-11) but it never ran past that and completed short term archiving so the output from the mostly finished year is just sitting in /run. These are the ends of all the log files:
CESM (this is the only one with a possible error):
1: Opened file TuneRCP85CoupledExt.cam.r.2122-01-01-00000.nc to write 1769472
1: NetCDF: Invalid dimension ID or name
1: NetCDF: Invalid dimension ID or name
1: NetCDF: Invalid dimension ID or name
1: NetCDF: Invalid dimension ID or name
1: NetCDF: Invalid dimension ID or name
1: NetCDF: Invalid dimension ID or name
1: NetCDF: Invalid dimension ID or name
1: NetCDF: Invalid dimension ID or name
1: NetCDF: Invalid dimension ID or name
1: NetCDF: Invalid dimension ID or name
1: NetCDF: Invalid dimension ID or name
1: NetCDF: Invalid dimension ID or name
1: NetCDF: Invalid dimension ID or name
1: Opened file TuneRCP85CoupledExt.cam.rs.2122-01-01-00000.nc to write 131072
CPL:
Write restart file at 21220101 0
(seq_rest_write) write rpointer file rpointer.drv
(seq_io_wopen) create file TuneRCP85CoupledExt.cpl.r.2122-01-01-00000.nc
tStamp_write: model date = 21220101 0 wall clock = 2022-05-01 19:39:46 avg dt = 16.50 dt = 115.27
memory_write: model date = 21220101 0 memory = 526.72 MB (highwater) -0.00 MB (usage) (pe= 0 comps= cpl ATM LND OCN ICE GLC ROF WAV)
(seq_mct_drv): =============== SUCCESSFUL TERMINATION OF CPL7-CCSM ===============
(seq_mct_drv): =============== at YMD,TOD = 21220101 0 ===============
(seq_mct_drv): =============== # simulated days (this run) = 365.000 ===============
(seq_mct_drv): =============== compute time (hrs) = 1.673 ===============
(seq_mct_drv): =============== # simulated years / cmp-day = 14.345 ===============
(seq_mct_drv): =============== pes min memory highwater (MB) 42.479 ===============
(seq_mct_drv): =============== pes max memory highwater (MB) 756.027 ===============
(seq_mct_drv): =============== pes min memory last usage (MB) -0.001 ===============
(seq_mct_drv): =============== pes max memory last usage (MB) -0.001 ===============
POP:
------------------------------------------------------------------------
===================
completed POP_Final
===================
ATM:
Number of completed timesteps:385440
Time step 385441 partially done to provide convectively adjusted and time filtered values for history tape.
------------------------------------------------------------
Total run time (sec) : 6073.10027311801
Time Step Loop run time(sec) : 6006.92394515709
SYPD : 14.3727176610261
******* END OF MODEL RUN *******
LND:
./TuneRCP85CoupledExt.clm2.r.2122-01-01-00000.nc
------------------------------------------------------------
(OPNFIL): Successfully opened file ./rpointer.lnd on unit= 85
Successfully wrote local restart pointer file
Successfully wrote out restart data at nstep = 385440
------------------------------------------------------------
ICE:
(ice_pio_wopen) create file TuneRCP85CoupledExt.cice.r.2122-01-01-00000.nc
Writing TuneRCP85CoupledExt.cice.r.2122-01-01-00000.nc
Restart written 17520 66919392000.0000 3942000000.00000
ROF:
(OPNFIL): Successfully opened file ./rpointer.rof on unit= 65
Successfully wrote local restart pointer file
Successfully wrote out restart data at nstep = 64240
------------------------------------------------------------
Would it be okay to just set STOP_OPTION to months in env_run.xml and try to run for one month to see if that gets it to the end of the year or should I delete the rpointer and 2122 files in /run and replace them with the restarts from the end of the previous completed year? Thanks!
CESM (this is the only one with a possible error):
1: Opened file TuneRCP85CoupledExt.cam.r.2122-01-01-00000.nc to write 1769472
1: NetCDF: Invalid dimension ID or name
1: NetCDF: Invalid dimension ID or name
1: NetCDF: Invalid dimension ID or name
1: NetCDF: Invalid dimension ID or name
1: NetCDF: Invalid dimension ID or name
1: NetCDF: Invalid dimension ID or name
1: NetCDF: Invalid dimension ID or name
1: NetCDF: Invalid dimension ID or name
1: NetCDF: Invalid dimension ID or name
1: NetCDF: Invalid dimension ID or name
1: NetCDF: Invalid dimension ID or name
1: NetCDF: Invalid dimension ID or name
1: NetCDF: Invalid dimension ID or name
1: Opened file TuneRCP85CoupledExt.cam.rs.2122-01-01-00000.nc to write 131072
CPL:
Write restart file at 21220101 0
(seq_rest_write) write rpointer file rpointer.drv
(seq_io_wopen) create file TuneRCP85CoupledExt.cpl.r.2122-01-01-00000.nc
tStamp_write: model date = 21220101 0 wall clock = 2022-05-01 19:39:46 avg dt = 16.50 dt = 115.27
memory_write: model date = 21220101 0 memory = 526.72 MB (highwater) -0.00 MB (usage) (pe= 0 comps= cpl ATM LND OCN ICE GLC ROF WAV)
(seq_mct_drv): =============== SUCCESSFUL TERMINATION OF CPL7-CCSM ===============
(seq_mct_drv): =============== at YMD,TOD = 21220101 0 ===============
(seq_mct_drv): =============== # simulated days (this run) = 365.000 ===============
(seq_mct_drv): =============== compute time (hrs) = 1.673 ===============
(seq_mct_drv): =============== # simulated years / cmp-day = 14.345 ===============
(seq_mct_drv): =============== pes min memory highwater (MB) 42.479 ===============
(seq_mct_drv): =============== pes max memory highwater (MB) 756.027 ===============
(seq_mct_drv): =============== pes min memory last usage (MB) -0.001 ===============
(seq_mct_drv): =============== pes max memory last usage (MB) -0.001 ===============
POP:
------------------------------------------------------------------------
===================
completed POP_Final
===================
ATM:
Number of completed timesteps:385440
Time step 385441 partially done to provide convectively adjusted and time filtered values for history tape.
------------------------------------------------------------
Total run time (sec) : 6073.10027311801
Time Step Loop run time(sec) : 6006.92394515709
SYPD : 14.3727176610261
******* END OF MODEL RUN *******
LND:
./TuneRCP85CoupledExt.clm2.r.2122-01-01-00000.nc
------------------------------------------------------------
(OPNFIL): Successfully opened file ./rpointer.lnd on unit= 85
Successfully wrote local restart pointer file
Successfully wrote out restart data at nstep = 385440
------------------------------------------------------------
ICE:
(ice_pio_wopen) create file TuneRCP85CoupledExt.cice.r.2122-01-01-00000.nc
Writing TuneRCP85CoupledExt.cice.r.2122-01-01-00000.nc
Restart written 17520 66919392000.0000 3942000000.00000
ROF:
(OPNFIL): Successfully opened file ./rpointer.rof on unit= 65
Successfully wrote local restart pointer file
Successfully wrote out restart data at nstep = 64240
------------------------------------------------------------
Would it be okay to just set STOP_OPTION to months in env_run.xml and try to run for one month to see if that gets it to the end of the year or should I delete the rpointer and 2122 files in /run and replace them with the restarts from the end of the previous completed year? Thanks!