I have a simple test setup on Cheyenne that's supposed to run for 5 days, then resubmit once to run for 5 more. This works fine when called with
Debug mode shows the following:
Is there some extra step I need to perform, aside from
Details:
./case.submit
… the first time I try it. If I then do ./xmlchange RESUBMIT=1,CONTINUE_RUN=FALSE
and submit again, the first segment runs fine, but the second segment crashes. The error is in components/cmeps/cesm/driver/esmApp.F90
at line 148 (the last line of the below):
Code:
if (ESMF_LogFoundError(rcToCheck=urc, msg=ESMF_LOGERR_PASSTHRU, &
line=__LINE__, &
file=__FILE__)) &
call ESMF_Finalize(endflag=ESMF_END_ABORT)
Debug mode shows the following:
Code:
PIO rearranger options:
comm type = p2p
comm fcd = 2denable
max pend req (comp2io) = 64
enable_hs (comp2io) = T
enable_isend (comp2io) = F
max pend req (io2comp) = 64
enable_hs (io2comp) = F
enable_isend (io2comp) = T
MPT ERROR: Rank 0(g:0) is aborting with error code 1.
Process ID: 19655, Host: r6i6n33, Program: /glade/scratch/samrabin/chain_20220722_01/bld/cesm.exe
MPT Version: HPE MPT 2.22 03/31/20 15:59:10
MPT: --------stack traceback-------
MPT: Attaching to program: /proc/19655/exe, process 19655
[...]
MPT: #9 0x00002b0ca8ff5b80 in esmf_initmod::esmf_finalize (
MPT: keywordenforcer=<error reading variable: Cannot access memory at address 0x0>, endflag=...,
MPT: rc=<error reading variable: Cannot access memory at address 0x0>)
MPT: at /glade/p/cesmdata/cseg/PROGS/build/28560/esmf-8.2.0b23/src/Superstructure/ESMFMod/src/ESMF_Init.F90:1226
MPT: #10 0x0000000000432c1d in esmapp ()
MPT: at /glade/u/home/samrabin/ctsm/components/cmeps/cime_config/../cesm/driver/esmApp.F90:148
MPT: #11 0x00000000004142a2 in main ()
MPT: #12 0x00002b0caea15a35 in __libc_start_main ()
MPT: from /glade/u/apps/ch/os/lib64/libc.so.6
MPT: #13 0x00000000004141a9 in _start () at ../sysdeps/x86_64/start.S:118
Is there some extra step I need to perform, aside from
./xmlchange RESUBMIT=1,CONTINUE_RUN=FALSE
, in order for this to work? I've tried deleting the rpointer files in the run directory, but that didn't help.Details:
- ctsm5.1.dev092
- Case directory:
/glade/u/home/samrabin/cases_ctsm/chain_20220722_01
- Full log from which I took above excerpt:
/glade/scratch/samrabin/chain_20220726.03.01/run/cesm.log.5172199.chadmin1.ib0.cheyenne.ucar.edu.220726-121510