I'm doing a 2-degree spinup of ctsm5.1.dev092. The 200-year CLM_ACCELERATED_SPINUP run went fine, as did the first 400 years of the second phase. Now I'm trying to do another 400 years, and I'm running into this error:
The first time this happened, I thought it might be a fluke and resubmitted. The resubmitted run successfully continued past the timestep where it crashed before, but about 100 years later it crashed again with the same failed assertion.
I'm wondering what my next troubleshooting steps should be. I could resubmit with DEBUG=TRUE, but I don't like how the error isn't definitely reproducible. There also is a fair amount of traceback already; I'd paste it here but it's too many characters. It's in
Any suggestions much appreciated.
Code:
1229:MPT ERROR: Assertion failed at ibdev_multirail.c:4297: "0 <= chan->queued"
1229:MPT ERROR: Rank 1229(g:1229) is aborting with error code 0.
1229: Process ID: 51000, Host: r1i5n32, Program: /glade/scratch/samrabin/spinup_ctsm5.1.dev092_I1850Clm50BgcCrop_f19-g17_pt2/bld/cesm.exe
1229: MPT Version: HPE MPT 2.22 03/31/20 15:59:10
I'm wondering what my next troubleshooting steps should be. I could resubmit with DEBUG=TRUE, but I don't like how the error isn't definitely reproducible. There also is a fair amount of traceback already; I'd paste it here but it's too many characters. It's in
/glade/u/home/samrabin/cases_ctsm/spinup_ctsm5.1.dev092_I1850Clm50BgcCrop_f19-g17_pt2/logs/run_logs/cesm.log.4956786.chadmin1.ib0.cheyenne.ucar.edu.220711-233504
.Any suggestions much appreciated.