New Member
Hi all,
I am trying to run an experiment where I am impose an atmospheric heating anomaly in the radiative heating subroutine of CAM5 and I am having some issues. I have previously successfully run hundreds of similar experiments, but I am now trying to do some additional experiments for the first time in 6 months and I am getting error messages. I am using the FC5 compset of CESM1.2.2 on the Australian supercomputer. I can run a 9 month control experiment without the imposed anomaly, but when I include the anomaly, the job fails almost immediately. I have tried re-submitting the job, but the error message changes each time.
The first and second time I submit the job, I get the following error:
On the third attempt, I get this error:
And on the fourth attempt, I get this error:
I have attached the CESM log files, Macros, env_mach_specific and env_mach_pes.xml.
I have never received a single error when running these experiments nor have I made any changes to the experimental setup/source code etc., so I am not sure what is the problem. Do you have any suggestions for me to try?
Thanks very much for your help,
I am trying to run an experiment where I am impose an atmospheric heating anomaly in the radiative heating subroutine of CAM5 and I am having some issues. I have previously successfully run hundreds of similar experiments, but I am now trying to do some additional experiments for the first time in 6 months and I am getting error messages. I am using the FC5 compset of CESM1.2.2 on the Australian supercomputer. I can run a 9 month control experiment without the imposed anomaly, but when I include the anomaly, the job fails almost immediately. I have tried re-submitting the job, but the error message changes each time.
The first and second time I submit the job, I get the following error:
CalcWorkPerBlock: Total blocks: 1152 Ice blocks: 1152 IceFree blocks: 0 Land blocks: 0
malloc(): invalid size (unsorted)
On the third attempt, I get this error:
MCT::m_Router::initp_: GSMap indices not increasing...Will correct
(seq_frac_check) [ice set] ERROR aborting
(shr_sys_abort) WARNING: calling shr_mpi_abort() and stopping
(seq_frac_check) [ice set] ERROR aborting
(shr_sys_abort) WARNING: calling shr_mpi_abort() and stopping
MPI_ABORT was invoked on rank 137 in communicator MPI_COMM_WORLD
with errorcode 1001.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
And on the fourth attempt, I get this error:
MCT::m_Router::initp_: GSMap indices not increasing...Will correct
MCT::m_Rearranger::Rearrange_: TargetAV size is not appropriate for this Rearranger
MCT::m_Rearranger::Rearrange_: error, InRearranger%RecvRouter%lAvsize=121, AttrVect_lsize(TargetAV)=0.
022.MCT(MPEU)::die.: from MCT::m_Rearranger::Rearrange_()
MPI_ABORT was invoked on rank 34 in communicator MPI_COMM_WORLD
with errorcode 2.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
[gadi-cpu-clx-1039:1706032] *** An error occurred in MPI_Isend
[gadi-cpu-clx-1039:1706032] *** reported by process [2884501505,28]
[gadi-cpu-clx-1039:1706032] *** on communicator MPI COMMUNICATOR 33 DUP FROM 0
[gadi-cpu-clx-1039:1706032] *** MPI_ERR_OTHER: known error not in list
[gadi-cpu-clx-1039:1706032] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[gadi-cpu-clx-1039:1706032] *** and potentially your MPI job)
I have attached the CESM log files, Macros, env_mach_specific and env_mach_pes.xml.
I have never received a single error when running these experiments nor have I made any changes to the experimental setup/source code etc., so I am not sure what is the problem. Do you have any suggestions for me to try?
Thanks very much for your help,