Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Previously successful run crashes after 18 years

James King

James King
Member
Hi all,

I'm running a coupled atmosphere-ocean compset in CESM2.2.0 on Cheyenne. I have successfully run this compset for 10 years - in the next run of 10 years it crashed at year 18. The error in the cesm.log is:

WHL, oc_tavg_helper is already associated; reset the tavg fields
0: sysmem size=80780.1 MB rss=906.4 MB share=77.2 MB text=79.1 MB datastack=0.0 MB
631:MPT ERROR: Assertion failed at ibdev_multirail.c:4331: "0 <= chan->queued"
631:MPT ERROR: Rank 631(g:631) is aborting with error code 0.
631: Process ID: 3080, Host: r1i5n32, Program: /glade/scratch/jamesking/maxforest_transient_test_02/bld/cesm.exe

There are no error messages in any of the component logs and a search for this error on these forums drew a blank. Any idea what might have gone wrong?

Thanks,

James
 

samrabin

Sam Rabin
Member
Huh. Yes, both my resubmits worked, at least initially. The first resubmit ended up crashing with the same ibdev_multirail.c error. The second resubmit is running now, and it's not on r1i5n32, so I'm hopeful.
 

James King

James King
Member
Huh. Yes, both my resubmits worked, at least initially. The first resubmit ended up crashing with the same ibdev_multirail.c error. The second resubmit is running now, and it's not on r1i5n32, so I'm hopeful.
Both my resubmits are now running too, so hopefully this has now been resolved.
 
Top