Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

B1850 compset cesm2.2 release blow, up error at rrtmg_lw_rtrnmc_m

milicak

Mehmet Ilicak
New Member
Hi all,

We are trying to run B1850 compset in our super computers here in Turkey.
I have downloaded cesm2.2-release (cesm2.2.0-0-g332937b) and added our compile settings for intel compiler in here.
I have sucessfully compiled the B1850 compset, however when I try to run I got the following blow up error;

(Task 33, block 1) Message from (lon, lat) ( 86.563, -1.736), which is global (i,j) (113, 181). Level: 13
(Task 33, block 1) MARBL WARNING (marbl_interior_tendency_mod:compute_large_detritus_prod): dz*DOP_loss_P_bal= 0.149E-011 exceeds Jint_Ptot_thres=
0.271E-013
max rss=2106982400.0 MB
memory_write: model date = 00010615 0 memory = -0.00 MB (highwater) 2009.38 MB (usage) (pe= 160 comps= OCN)
WHL, oc_tavg_helper is already associated; reset the tavg fields
max rss=581894144.0 MB
max rss=380895232.0 MB
memory_write: model date = 00010615 0 memory = -0.00 MB (highwater) 363.25 MB (usage) (pe= 80 comps= cpl ATM ICE)
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
cesm.exe 00000000033CBD3D Unknown Unknown Unknown
libpthread-2.17.s 00002B35315056D0 Unknown Unknown Unknown
cesm.exe 0000000000B6BA3B rrtmg_lw_rtrnmc_m 375 rrtmg_lw_rtrnmc.f90
cesm.exe 0000000000B672FE rrtmg_lw_rad_mp_r 386 rrtmg_lw_rad.f90
cesm.exe 0000000000782228 radlw_mp_rad_rrtm 191 radlw.F90
cesm.exe 0000000000768230 radiation_mp_radi 1242 radiation.F90
cesm.exe 000000000071DBDE physpkg_mp_tphysb 2614 physpkg.F90


This cesm2.2 release blows up at year 1, however when I downloaded cesm2.1.1 release and ran the same compset, the model blows up with the same error but year 7.

I also added our compiler options below, there is only one option related to cam (init=zero,arrays).
I am not sure if the issue is from that or optimization issue, although -O2 is not very aggressive.

<compiler MACH="sariyer" COMPILER="intel">
<CPPDEFS>
<append> -D$(OS) </append>
</CPPDEFS>
<FFLAGS>
<append> -xCORE-AVX2 -no-fma </append>
</FFLAGS>
<FFLAGS>
<append DEBUG="FALSE"> -O2 </append>
<append MODEL="cam"> -init=zero,arrays </append>
</FFLAGS>
<NETCDF_PATH>$ENV{NETCDF}</NETCDF_PATH>
<PNETCDF_PATH>$ENV{PNETCDF}</PNETCDF_PATH>
<MPI_PATH>$(MPI_ROOT)</MPI_PATH>
<MPI_LIB_NAME>mpi</MPI_LIB_NAME>
<MPICC> mpiicc </MPICC>
<MPICXX> mpiicc </MPICXX>
<MPIFC> mpiifort </MPIFC>
<LDFLAGS>
<append> -Wl,-rpath,$ENV{MPI_ROOT}/lib</append>
</LDFLAGS>
<FFLAGS>
<append>-I$(NETCDF)/include </append>
</FFLAGS>
<SLIBS>
<append>-L$(NETCDF)/lib -lnetcdff -L/lib -L$(NETCDF)/lib -lnetcdf -Wl,-rpath -Wl,$(NETCDF)/lib -Wl,-rpath -Wl,$(NETCDF)/lib</append>
<append>-Wl,-rpath,$(PNETCDF)/lib </append>
</SLIBS>
<HAS_F2008_CONTIGUOUS>TRUE</HAS_F2008_CONTIGUOUS>
</compiler>

I am wondering if anybody has issues related to this.

Thanks in advance,
Best,
Mehmet
 

milicak

Mehmet Ilicak
New Member
Hi sorry for my late respond, we are trying different compiler options and fflags but no luck so far.
The machine has 40 cores per nodes with 192GB memory, I am not sure why the max rss is that high!
It might be a memory leak issue, because if I run the code blows up at a different time compared to if my PhD student runs.
However, I am not sure how I can track it.

Thanks in advance,
Mehmet
 

ganbaranaito

takufuu
Member
Hi sorry for my late respond, we are trying different compiler options and fflags but no luck so far.
The machine has 40 cores per nodes with 192GB memory, I am not sure why the max rss is that high!
It might be a memory leak issue, because if I run the code blows up at a different time compared to if my PhD student runs.
However, I am not sure how I can track it.

Thanks in advance,
Mehmet
Hello, I met the same problem but don't know how to solve it. Do you know how to solve it? Thanks in advance!
 

milicak

Mehmet Ilicak
New Member
Hi,

Not sure if I answered this or not, but we moved to intel 2021 compiler, and somehow the latest version of the compiler worked fine.
 

ganbaranaito

takufuu
Member
Hi,

Not sure if I answered this or not, but we moved to intel 2021 compiler, and somehow the latest version of the compiler worked fine.
Thanks for your reply. We also suspect the problem is about the version of compiler which we use intel 2018.
 
Top