Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

SCAM run failed

pappu

Pappu Paul
New Member
What version of the code are you using?
Processing externals description file : Externals.cfg
Processing externals description file : Externals_CISM.cfg
Processing externals description file : Externals_CLM.cfg
Processing externals description file : Externals_FMS.cfg
Processing externals description file : Externals_CAM.cfg
Checking status of externals: cice, cime, cism, source_cism, clm, fates, ptclm, mosart, rtm, fms, fms, cam, chem_proc, carma, cosp2, clubb, silhs, pumas, atmos_phys, atmos_cubed_sphere, mpas,
./chem_proc
clean sandbox, on chem_proc5_0_04
M ./cime
modified sandbox, on cime5.8.44
./components/cice
clean sandbox, on cice5_20200430
./components/cism
clean sandbox, on cism2_1_69
./components/cism/source_cism
clean sandbox, on f1a88d6bbe3bb5e2e8817f91aed6de87227f4bb7
M ./components/clm
modified sandbox, on ctsm5.1.dev019
./components/clm/src/fates
clean sandbox, on sci.1.30.0_api.8.0.0
./components/clm/tools/PTCLM
clean sandbox, on PTCLM2_20200121
./components/mosart
clean sandbox, on branch_tag/pio2.n01_mosart1_0_38
./components/rtm
clean sandbox, on branch_tag/nuopc_cap.n01_rtm1_0_73
./libraries/FMS
clean sandbox, on fi_20200609
./libraries/FMS/src
clean sandbox, on xanadu_esm4_20190304
./src/atmos_phys
clean sandbox, on atmos_phys0_00_011
./src/dynamics/fv3/atmos_cubed_sphere
clean sandbox, on fv3_cesm.04
./src/dynamics/mpas/dycore
clean sandbox, on 8fb891892fc877aa06f9e316ceeba9fff3de66b2
./src/physics/carma/base
clean sandbox, on carma3_49
./src/physics/clubb
clean sandbox, on clubb_release_b76a124_20200220_c20200320
./src/physics/cosp2/src
clean sandbox, on v2.1.4cesm
./src/physics/pumas
clean sandbox, on pumas_cam-release_v1.17
./src/physics/silhs
clean sandbox, on silhs_clubb_release_b76a124_20200220_c20200320




Hi, I am a graduate student in the University of Illinois, recently our local computing cluster upgraded from CentOS 7.9 to AlmaLinux 9.5 (built in GCC 11.5.0).
In the upgraded system, SCAM at TWP run failed with the error below. This exact same case was run completely fine in the old system.
gasaerexch addfld soa_a2_sfgaex2 kg/m2/s
gasaerexch addfld soa_c2_sfgaex2 kg/m2/s

subr. modal_aero_coag_init
pair 1 mode 2 ---> mode 1 eff 1
spec 29=so4_a2 ---> spec 28=so4_a1
spec 33=soa_a2 ---> spec 32=soa_a1
spec 19=ncl_a2 ---> spec 18=ncl_a1
spec 14=dst_a2 ---> spec 13=dst_a1
pair 2 mode 4 ---> mode 1 eff 1
spec 26=pom_a4 ---> spec 25=pom_a1
malloc_consolidate(): unaligned fastbin chunk detected

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:
#0 0x14c188aaa8a0 in ???
#1 0x14c188aa9a45 in ???
Is it due to improper flags in config_compiler file? The code below is for current config_compiler.xml setup:
<compiler MACH="keeling9_scam" COMPILER="gnu">
<CFLAGS>
<base> -std=gnu99 </base>
<append compile_threaded="TRUE"> -fopenmp </append>
<append DEBUG="TRUE"> -g -Wall -Og -fbacktrace -ffpe-trap=invalid,zero,overflow -fcheck=bounds </append>
<append DEBUG="FALSE"> -O2 </append>
</CFLAGS>
<CONFIG_ARGS>
<base> --host=Linux </base>
</CONFIG_ARGS>
<CPPDEFS>
<!-- Top (The GNU Fortran Compiler) -->
<append> -DFORTRANUNDERSCORE -DNO_R16 -DCPRGNU</append>
</CPPDEFS>
<CXX_LIBS>
<base>-lstdc++ -lmpi_cxx</base>
</CXX_LIBS>
<CXX_LINKER>FORTRAN</CXX_LINKER>
<FC_AUTO_R8>
<base> -fdefault-real-8 </base>
</FC_AUTO_R8>
<FFLAGS>
<base> -fallow-argument-mismatch -fallow-invalid-boz -fconvert=big-endian -ffree-line-length-none -ffixed-line-length-none </base>
<append compile_threaded="TRUE"> -fopenmp </append>
<append DEBUG="TRUE"> -g -Wall -Og -fbacktrace -ffpe-trap=zero,overflow -fcheck=bounds </append>
<append DEBUG="FALSE"> -O2 </append>
<append>"-I/usr/lib64/gfortran/modules"</append>
</FFLAGS>
<FFLAGS_NOOPT>
<base> -O0 </base>
</FFLAGS_NOOPT>
<FIXEDFLAGS>
<base> -ffixed-form </base>
</FIXEDFLAGS>
<FREEFLAGS>
<base> -ffree-form </base>
</FREEFLAGS>
<HAS_F2008_CONTIGUOUS>FALSE</HAS_F2008_CONTIGUOUS>
<LDFLAGS>
<append compile_threaded="TRUE"> -fopenmp </append>
</LDFLAGS>
<MPICC> mpicc </MPICC>
<MPICXX> mpicxx </MPICXX>
<MPIFC> mpif90 </MPIFC>
<!-- WARNING! mct build permits whitespace BEFORE the following variable but NOT AFTER! -->
<NETCDF_PATH> /usr</NETCDF_PATH>
<SCC> gcc </SCC>
<SCXX> g++ </SCXX>
<SFC> gfortran </SFC>
<SLIBS>
<append> -L/data/keeling/a/pappup2/OpenBLAS/openblas -lopenblas -lnetcdff -lnetcdf -lhdf5</append>
</SLIBS>
<SUPPORTS_CXX>TRUE</SUPPORTS_CXX>
</compiler>


The code below is the old system config_compiler.xml setup:
<compiler MACH="keeling_scam" COMPILER="gnu">
<CFLAGS>
<append DEBUG="FALSE"> -O2 </append>
</CFLAGS>
<CONFIG_ARGS>
<base> --host=Linux </base>
</CONFIG_ARGS>
<CXX_LIBS>
<base>-lstdc++ -lmpi_cxx</base>
</CXX_LIBS>
<FFLAGS>
<append DEBUG="FALSE"> -O2 </append>
</FFLAGS>
<NETCDF_PATH> /sw/netcdf4-4.7.4-gnu-9.3.0</NETCDF_PATH>
<SLIBS>
<append> $SHELL{${NETCDF_PATH}/bin/nf-config --flibs} -lnetcdf -lnetcdff -lblas -llapack</append>
</SLIBS>
</compiler>
 

Attachments

  • cesm.log.257739.260219-204052.txt
    16.2 KB · Views: 3
  • config_machines.xml.txt
    144 KB · Views: 0
  • config_compilers.xml.txt
    56.8 KB · Views: 0

pappu

Pappu Paul
New Member
Error with DEBUG=TRUE:
subr. modal_aero_coag_init
pair 1 mode 2 ---> mode 1 eff 1
spec 29=so4_a2 ---> spec 28=so4_a1
spec 33=soa_a2 ---> spec 32=soa_a1
spec 19=ncl_a2 ---> spec 18=ncl_a1
spec 14=dst_a2 ---> spec 13=dst_a1
pair 2 mode 4 ---> mode 1 eff 1
spec 26=pom_a4 ---> spec 25=pom_a1
Completion(send) value=0 tag=1
Completion(send) value=0 tag=1
Completion(send) value=0 tag=1
Completion(send) value=0 tag=1
Completion(send) value=0 tag=1
Completion(send) value=0 tag=1
Completion(send) value=-1075419546 tag=1
Completion(send) value=-1075419546 tag=1
Completion(send) value=1065353216 tag=1
Completion(send) value=1065353216 tag=1
Completion(send) value=0 tag=1
Completion(send) value=1037521022 tag=1
Completion(send) value=0 tag=1
Completion(send) value=0 tag=1
corrupted size vs. prev_size

Program received signal SIGABRT: Process abort signal.
 

pappu

Pappu Paul
New Member
Hi Pappu,

Can you attach the full file that contains the error above?

Courtney
Hi Courtney,
Please see the attached file for the error message.


In the SCAM run, the model automatically switches to mpi-serial, which results in the error.
However, when I run the model with MPILIB=openmpi, it completes successfully at any location.
 

Attachments

  • cesm.log.259159.260220-153735.txt
    16.2 KB · Views: 1
Top