Fatal error in MPI_Allreduce

hannay

Cecile Hannay
AMWG Liaison
Staff member
I have a user who is trying to run teh compset F_2000_CAM5_PM on yellowstone. She is getting theerror: 255:Abort(1) on node 255 (rank 255 in comm 1140850688): Fatal error in MPI_Allreduce: Message truncated, error stack:
 255:MPIDI_Buffer_copy(73): Message truncated; 48 bytes received but buffer size is 40
 255:INFO: 0031-306  pm_atexit: pm_exit_value is 1.
INFO: 0031-251  task 255 exited: rc=1Any idea ? Thanks.
 

jedwards

CSEG and Liaisons
Staff member
I see that most components are set to 256 tasks, but ROF and WAV are set to 240 and the error is on task 240.   Also 256 is a bad choice of task count if ptile=15,  try setting all components to 240.  
 
Back
Top