Previously successful run crashes after 18 years

James King

James King
Member
Hi all,

I'm running a coupled atmosphere-ocean compset in CESM2.2.0 on Cheyenne. I have successfully run this compset for 10 years - in the next run of 10 years it crashed at year 18. The error in the cesm.log is:

WHL, oc_tavg_helper is already associated; reset the tavg fields
0: sysmem size=80780.1 MB rss=906.4 MB share=77.2 MB text=79.1 MB datastack=0.0 MB
631:MPT ERROR: Assertion failed at ibdev_multirail.c:4331: "0 <= chan->queued"
631:MPT ERROR: Rank 631(g:631) is aborting with error code 0.
631: Process ID: 3080, Host: r1i5n32, Program: /glade/scratch/jamesking/maxforest_transient_test_02/bld/cesm.exe

There are no error messages in any of the component logs and a search for this error on these forums drew a blank. Any idea what might have gone wrong?

Thanks,

James
 

samrabin

Sam Rabin
Member
Huh. Yes, both my resubmits worked, at least initially. The first resubmit ended up crashing with the same ibdev_multirail.c error. The second resubmit is running now, and it's not on r1i5n32, so I'm hopeful.
 

James King

James King
Member
Huh. Yes, both my resubmits worked, at least initially. The first resubmit ended up crashing with the same ibdev_multirail.c error. The second resubmit is running now, and it's not on r1i5n32, so I'm hopeful.
Both my resubmits are now running too, so hopefully this has now been resolved.
 
Back
Top