I'm getting errors in CLM-FATES cases on Derecho that ran successfully on Feb 23. The land and datm models have initialized successfully. The cesm log files are hard (impossible for me) to interpret:
dec1786.hsn.de.hpc.ucar.edu 17: MPICH ERROR [Rank 17] [job id d19fa325-7722-4290-a09a-5182cd0be3ca] [Thu Feb 29 16:48:12 2024] [dec1786] - Abort(1) (rank 17 in comm 496): application called MPI_Abort(comm=0x84000001, 1) - process 17
dec1786.hsn.de.hpc.ucar.edu 17:
dec1786.hsn.de.hpc.ucar.edu 16: forrtl: severe (174): SIGSEGV, segmentation fault occurred
dec1786.hsn.de.hpc.ucar.edu 16: Image PC Routine Line Source
dec1786.hsn.de.hpc.ucar.edu 16: libpthread-2.31.s 000014C115F3B8C0 Unknown Unknown Unknown
dec1786.hsn.de.hpc.ucar.edu 16: libmpi_intel.so.1 000014C113EFAE7E Unknown Unknown Unknown
dec1786.hsn.de.hpc.ucar.edu 16: libmpi_intel.so.1 000014C113D0922F Unknown Unknown Unknown
dec1786.hsn.de.hpc.ucar.edu 16: libmpi_intel.so.1 000014C1123366A8 MPI_Abort Unknown Unknown
dec1786.hsn.de.hpc.ucar.edu 16: libesmf.so 000014C11DF1F1D7 _ZN5ESMCI3VMK5abo Unknown Unknown
dec1786.hsn.de.hpc.ucar.edu 16: libesmf.so 000014C11DF1D9F4 _ZN5ESMCI2VM5abor Unknown Unknown
dec1786.hsn.de.hpc.ucar.edu 16: libesmf.so 000014C11DF32E45 c_esmc_vmabort_ Unknown Unknown
dec1786.hsn.de.hpc.ucar.edu 16: libesmf.so 000014C11E720868 esmf_vmmod_mp_esm Unknown Unknown
dec1786.hsn.de.hpc.ucar.edu 16: libesmf.so 000014C11E5A751A esmf_initmod_mp_e Unknown Unknown
dec1786.hsn.de.hpc.ucar.edu 16: cesm.exe 00000000004329FB MAIN__ 145 esmApp.F90
dec1786.hsn.de.hpc.ucar.edu 16: cesm.exe 000000000042217D Unknown Unknown Unknown
dec1786.hsn.de.hpc.ucar.edu 16: libc-2.31.so 000014C11182D29D __libc_start_main Unknown Unknown
dec1786.hsn.de.hpc.ucar.edu 16: cesm.exe 00000000004220AA Unknown Unknown Unknown
One case is here
/glade/u/home/pbuotte/Earthshot/fates_cases/derecho/DryBrazil_7BET_2BDT_coffee
with log files here
/glade/derecho/scratch/pbuotte/glade/derecho/scratch/pbuotte/run
I appreciate any insights on what these errors mean.
dec1786.hsn.de.hpc.ucar.edu 17: MPICH ERROR [Rank 17] [job id d19fa325-7722-4290-a09a-5182cd0be3ca] [Thu Feb 29 16:48:12 2024] [dec1786] - Abort(1) (rank 17 in comm 496): application called MPI_Abort(comm=0x84000001, 1) - process 17
dec1786.hsn.de.hpc.ucar.edu 17:
dec1786.hsn.de.hpc.ucar.edu 16: forrtl: severe (174): SIGSEGV, segmentation fault occurred
dec1786.hsn.de.hpc.ucar.edu 16: Image PC Routine Line Source
dec1786.hsn.de.hpc.ucar.edu 16: libpthread-2.31.s 000014C115F3B8C0 Unknown Unknown Unknown
dec1786.hsn.de.hpc.ucar.edu 16: libmpi_intel.so.1 000014C113EFAE7E Unknown Unknown Unknown
dec1786.hsn.de.hpc.ucar.edu 16: libmpi_intel.so.1 000014C113D0922F Unknown Unknown Unknown
dec1786.hsn.de.hpc.ucar.edu 16: libmpi_intel.so.1 000014C1123366A8 MPI_Abort Unknown Unknown
dec1786.hsn.de.hpc.ucar.edu 16: libesmf.so 000014C11DF1F1D7 _ZN5ESMCI3VMK5abo Unknown Unknown
dec1786.hsn.de.hpc.ucar.edu 16: libesmf.so 000014C11DF1D9F4 _ZN5ESMCI2VM5abor Unknown Unknown
dec1786.hsn.de.hpc.ucar.edu 16: libesmf.so 000014C11DF32E45 c_esmc_vmabort_ Unknown Unknown
dec1786.hsn.de.hpc.ucar.edu 16: libesmf.so 000014C11E720868 esmf_vmmod_mp_esm Unknown Unknown
dec1786.hsn.de.hpc.ucar.edu 16: libesmf.so 000014C11E5A751A esmf_initmod_mp_e Unknown Unknown
dec1786.hsn.de.hpc.ucar.edu 16: cesm.exe 00000000004329FB MAIN__ 145 esmApp.F90
dec1786.hsn.de.hpc.ucar.edu 16: cesm.exe 000000000042217D Unknown Unknown Unknown
dec1786.hsn.de.hpc.ucar.edu 16: libc-2.31.so 000014C11182D29D __libc_start_main Unknown Unknown
dec1786.hsn.de.hpc.ucar.edu 16: cesm.exe 00000000004220AA Unknown Unknown Unknown
One case is here
/glade/u/home/pbuotte/Earthshot/fates_cases/derecho/DryBrazil_7BET_2BDT_coffee
with log files here
/glade/derecho/scratch/pbuotte/glade/derecho/scratch/pbuotte/run
I appreciate any insights on what these errors mean.