Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Porting CESM, run error: forrtl: severe (174): SIGSEGV, segmentation fault occurred

Jan Klenner

Jan Klenner
New Member
We have been trying to run the Community Earth System Model V2.2. (cesm2.2.0-0-g332937b) on the local HPC infrastructure.
Unfortunately, we have been running into some issues and are now stuck.

Given the nature of the error message which occurs during the run using slurm
“forrtl: severe (174): SIGSEGV, segmentation fault occurred”,
we are hopeful to find a solution.

From online forums, we suspect that there might be a problem with the RAM allocation.
We have also tried to run a relatively small case study and encountered the same problem.
Attached the .log file from the model run in a shortened version.
The run crashes after approx. 2 min, I cannot locate the slurm output (if it exists) but attached the run environment.
Also attached CaseStatus.txt


We would be happy about any help, best regards,

Jan Klenner
 

Attachments

  • CaseStatus.txt
    10.4 KB · Views: 12

jedwards

CSEG and Liaisons
Staff member
Still no files.
See log file for details: /cluster/work/users/jankle/cesm/defaulttest/run/cesm.log.4432839.220713-104020
 

jedwards

CSEG and Liaisons
Staff member
From the lnd log:
water_tracers settings
&WATER_TRACERS_INPARM
ENABLE_WATER_TRACER_CONSISTENCY_CHECKS = F,
ENABLE_WATER_ISOTOPES = F
/

And the cesm log:

forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
cesm.exe 0000000009FAEABD Unknown Unknown Unknown
libpthread-2.17.s 00002B125ED3C630 Unknown Unknown Unknown
libiomp5.so 00002B1259E09FE5 Unknown Unknown Unknown
cesm.exe 0000000009FF0544 Unknown Unknown Unknown
cesm.exe 0000000007BD09EE watertracerutils_ 44 WaterTracerUtils.F90
cesm.exe 0000000007B3CCFF waterdiagnosticty 105 WaterDiagnosticType.F90
cesm.exe 0000000007B3CA6B waterdiagnosticty 81 WaterDiagnosticType.F90
cesm.exe 0000000007B0D097 waterdiagnosticbu 141 WaterDiagnosticBulkType.F90
cesm.exe 0000000007BDE646 watertype_mp_doin 338 WaterType.F90
cesm.exe 0000000007BDB190 watertype_mp_init 237 WaterType.F90
cesm.exe 00000000058A52ED clm_instmod_mp_cl 294 clm_instMod.F90
cesm.exe 000000000589C8C5 clm_initializemod 449 clm_initializeMod.F90
cesm.exe 00000000058265A0 lnd_comp_mct_mp_l 238 lnd_comp_mct.F90
cesm.exe 0000000000461330 component_mod_mp_ 257 component_mod.F90
cesm.exe 0000000000427489 cime_comp_mod_mp_ 1353 cime_comp_mod.F90
cesm.exe 00000000004580AD MAIN__ 122 cime_driver.F90
cesm.exe 0000000000414BAE Unknown Unknown Unknown
libc-2.17.so 00002B125F26D555 __libc_start_main Unknown Unknown
cesm.exe 0000000000414AA9 Unknown Unknown Unknown
 

Jan Klenner

Jan Klenner
New Member
From the lnd log:
water_tracers settings
&WATER_TRACERS_INPARM
ENABLE_WATER_TRACER_CONSISTENCY_CHECKS = F,
ENABLE_WATER_ISOTOPES = F
/

And the cesm log:

forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
cesm.exe 0000000009FAEABD Unknown Unknown Unknown
libpthread-2.17.s 00002B125ED3C630 Unknown Unknown Unknown
libiomp5.so 00002B1259E09FE5 Unknown Unknown Unknown
cesm.exe 0000000009FF0544 Unknown Unknown Unknown
cesm.exe 0000000007BD09EE watertracerutils_ 44 WaterTracerUtils.F90
cesm.exe 0000000007B3CCFF waterdiagnosticty 105 WaterDiagnosticType.F90
cesm.exe 0000000007B3CA6B waterdiagnosticty 81 WaterDiagnosticType.F90
cesm.exe 0000000007B0D097 waterdiagnosticbu 141 WaterDiagnosticBulkType.F90
cesm.exe 0000000007BDE646 watertype_mp_doin 338 WaterType.F90
cesm.exe 0000000007BDB190 watertype_mp_init 237 WaterType.F90
cesm.exe 00000000058A52ED clm_instmod_mp_cl 294 clm_instMod.F90
cesm.exe 000000000589C8C5 clm_initializemod 449 clm_initializeMod.F90
cesm.exe 00000000058265A0 lnd_comp_mct_mp_l 238 lnd_comp_mct.F90
cesm.exe 0000000000461330 component_mod_mp_ 257 component_mod.F90
cesm.exe 0000000000427489 cime_comp_mod_mp_ 1353 cime_comp_mod.F90
cesm.exe 00000000004580AD MAIN__ 122 cime_driver.F90
cesm.exe 0000000000414BAE Unknown Unknown Unknown
libc-2.17.so 00002B125F26D555 __libc_start_main Unknown Unknown
cesm.exe 0000000000414AA9 Unknown Unknown Unknown
Hei jedwards,
please excuse my ignorance, but I am not sure how to understand your reply.
 
Hi,

After running CESM 2.1 for many years on our cluster without any problems I've just started to port and run CESM 2.2.1. And I see the exact same problem: SIGSEGV in CLM at WaterTracerUtils.F90. I'm at the very beginning of debugging this problem. Just do be sure has someone found a solution?

Just to summarize. The problem is triggered/caused by WaterTracerUtils.F90 (which was not in the CESM 2.1 CLM code) and it happens independent of whether enable_water_isotopes is set to false (default) or true. Stupid question, why does CLM get to WaterTracerUtils.F90 even when water isotopes are disabled?

We use fortan and mpi from intel 2018 parallel studio for CESM 2.1 and for the first tests with CESM 2.2.1

Cheers, Urs
 
Top