Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Ice terminating before coupler?

Hi all,
I encountered a problem when I run a CCSM3 case, with T42_gx1v3 resolution and component set B.
In fact, I have finished a control run of T31_gx3v5 for 100 model years on the same machine without any problem.
But when I change the resolution to T42_gx1v3, the run failed after 12 years and 5 months (model time). Then I restart this run, however, the restart run also failed after another 12 years and 5 months.

When I check the job.o file, it writes:
rank 4 in job 1 compute-0-17_53982 caused collective abort of all ranks
exit status of rank 4: killed by signal 9
rank 0 in job 1 compute-0-17_53982 caused collective abort of all ranks
exit status of rank 0: killed by signal 9
Thu Sep 9 18:22:39 CST 2010 -- CSM EXECUTION HAS FINISHED

There is no error messages in various log files except the ice.log, which writes ’(ice) terminating before coupler’.
The input data I used is distributed with CCSM3.
1. Could anyone give me a clue that why it always failed after 12 years and 5 months?
2. Another question, I restart the former run from 0012-01-01. After the restart run failed, I compared the data file ‘cam2.h0.0012-03-01.nc’ with the same time period data file of the former run. But they are not totally same. Say, V wind value varies from -19.2612 to 19.0705 m/s in the former run, but -23.415 to 26.3333 m/s in the restart run. Restart run can continue the original run bit by bit just like it had not stopped. But why they are not totally same?

Thanks in advance!
 
Top