Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Porting CESM, run error: forrtl: severe (174): SIGSEGV, segmentation fault occurred

Jan Klenner

Jan Klenner
New Member
We have been trying to run the Community Earth System Model V2.2. (cesm2.2.0-0-g332937b) on the local HPC infrastructure.
Unfortunately, we have been running into some issues and are now stuck.

Given the nature of the error message which occurs during the run using slurm
“forrtl: severe (174): SIGSEGV, segmentation fault occurred”,
we are hopeful to find a solution.

From online forums, we suspect that there might be a problem with the RAM allocation.
We have also tried to run a relatively small case study and encountered the same problem.
Attached the .log file from the model run in a shortened version.
The run crashes after approx. 2 min, I cannot locate the slurm output (if it exists) but attached the run environment.
Also attached CaseStatus.txt


We would be happy about any help, best regards,

Jan Klenner
 

Attachments

  • CaseStatus.txt
    10.4 KB · Views: 18

jedwards

CSEG and Liaisons
Staff member
Still no files.
See log file for details: /cluster/work/users/jankle/cesm/defaulttest/run/cesm.log.4432839.220713-104020
 

jedwards

CSEG and Liaisons
Staff member
From the lnd log:
water_tracers settings
&WATER_TRACERS_INPARM
ENABLE_WATER_TRACER_CONSISTENCY_CHECKS = F,
ENABLE_WATER_ISOTOPES = F
/

And the cesm log:

forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
cesm.exe 0000000009FAEABD Unknown Unknown Unknown
libpthread-2.17.s 00002B125ED3C630 Unknown Unknown Unknown
libiomp5.so 00002B1259E09FE5 Unknown Unknown Unknown
cesm.exe 0000000009FF0544 Unknown Unknown Unknown
cesm.exe 0000000007BD09EE watertracerutils_ 44 WaterTracerUtils.F90
cesm.exe 0000000007B3CCFF waterdiagnosticty 105 WaterDiagnosticType.F90
cesm.exe 0000000007B3CA6B waterdiagnosticty 81 WaterDiagnosticType.F90
cesm.exe 0000000007B0D097 waterdiagnosticbu 141 WaterDiagnosticBulkType.F90
cesm.exe 0000000007BDE646 watertype_mp_doin 338 WaterType.F90
cesm.exe 0000000007BDB190 watertype_mp_init 237 WaterType.F90
cesm.exe 00000000058A52ED clm_instmod_mp_cl 294 clm_instMod.F90
cesm.exe 000000000589C8C5 clm_initializemod 449 clm_initializeMod.F90
cesm.exe 00000000058265A0 lnd_comp_mct_mp_l 238 lnd_comp_mct.F90
cesm.exe 0000000000461330 component_mod_mp_ 257 component_mod.F90
cesm.exe 0000000000427489 cime_comp_mod_mp_ 1353 cime_comp_mod.F90
cesm.exe 00000000004580AD MAIN__ 122 cime_driver.F90
cesm.exe 0000000000414BAE Unknown Unknown Unknown
libc-2.17.so 00002B125F26D555 __libc_start_main Unknown Unknown
cesm.exe 0000000000414AA9 Unknown Unknown Unknown
 

Jan Klenner

Jan Klenner
New Member
From the lnd log:
water_tracers settings
&WATER_TRACERS_INPARM
ENABLE_WATER_TRACER_CONSISTENCY_CHECKS = F,
ENABLE_WATER_ISOTOPES = F
/

And the cesm log:

forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
cesm.exe 0000000009FAEABD Unknown Unknown Unknown
libpthread-2.17.s 00002B125ED3C630 Unknown Unknown Unknown
libiomp5.so 00002B1259E09FE5 Unknown Unknown Unknown
cesm.exe 0000000009FF0544 Unknown Unknown Unknown
cesm.exe 0000000007BD09EE watertracerutils_ 44 WaterTracerUtils.F90
cesm.exe 0000000007B3CCFF waterdiagnosticty 105 WaterDiagnosticType.F90
cesm.exe 0000000007B3CA6B waterdiagnosticty 81 WaterDiagnosticType.F90
cesm.exe 0000000007B0D097 waterdiagnosticbu 141 WaterDiagnosticBulkType.F90
cesm.exe 0000000007BDE646 watertype_mp_doin 338 WaterType.F90
cesm.exe 0000000007BDB190 watertype_mp_init 237 WaterType.F90
cesm.exe 00000000058A52ED clm_instmod_mp_cl 294 clm_instMod.F90
cesm.exe 000000000589C8C5 clm_initializemod 449 clm_initializeMod.F90
cesm.exe 00000000058265A0 lnd_comp_mct_mp_l 238 lnd_comp_mct.F90
cesm.exe 0000000000461330 component_mod_mp_ 257 component_mod.F90
cesm.exe 0000000000427489 cime_comp_mod_mp_ 1353 cime_comp_mod.F90
cesm.exe 00000000004580AD MAIN__ 122 cime_driver.F90
cesm.exe 0000000000414BAE Unknown Unknown Unknown
libc-2.17.so 00002B125F26D555 __libc_start_main Unknown Unknown
cesm.exe 0000000000414AA9 Unknown Unknown Unknown
Hei jedwards,
please excuse my ignorance, but I am not sure how to understand your reply.
 
Hi,

After running CESM 2.1 for many years on our cluster without any problems I've just started to port and run CESM 2.2.1. And I see the exact same problem: SIGSEGV in CLM at WaterTracerUtils.F90. I'm at the very beginning of debugging this problem. Just do be sure has someone found a solution?

Just to summarize. The problem is triggered/caused by WaterTracerUtils.F90 (which was not in the CESM 2.1 CLM code) and it happens independent of whether enable_water_isotopes is set to false (default) or true. Stupid question, why does CLM get to WaterTracerUtils.F90 even when water isotopes are disabled?

We use fortan and mpi from intel 2018 parallel studio for CESM 2.1 and for the first tests with CESM 2.2.1

Cheers, Urs
 

heplaas

Haley Plaas
New Member
Hi, it's been a while since this thread was active, but I am wondering if anyone found a solution?

I am new to CESM but am currently porting over a modified version of CAM (the mechanism for intermediate complexities for modeling iron, MIMI: 10.5194/gmd-12-3835-2019), from Cheyenne to Derecho. Several of our collaborators have had success in doing so, but I am still at a loss and keep running into 'forrtl: severe (174): SIGSEGV, segmentation fault occurred' within the cesm.log.

I have attempted to update ulimit in the base model test.sh files (based on advice from this thread forrtl: severe (174): SIGSEGV, segmentation fault occurred), from ulimit -c to ulimit -s unlimited, as well as run the model at a different resolution (f19_f19_mg17 rather than f09_f09_mg17) to no avail.

I have attached my current cesm error log, but can provide more files as needed.

Thanks for any insight!
 

Attachments

  • cesm.log.4084642.desched1.zip
    145.6 KB · Views: 7

skyler1

Skyler Yang
New Member
Hi, it's been a while since this thread was active, but I am wondering if anyone found a solution?

I am new to CESM but am currently porting over a modified version of CAM (the mechanism for intermediate complexities for modeling iron, MIMI: 10.5194/gmd-12-3835-2019), from Cheyenne to Derecho. Several of our collaborators have had success in doing so, but I am still at a loss and keep running into 'forrtl: severe (174): SIGSEGV, segmentation fault occurred' within the cesm.log.

I have attempted to update ulimit in the base model test.sh files (based on advice from this thread forrtl: severe (174): SIGSEGV, segmentation fault occurred), from ulimit -c to ulimit -s unlimited, as well as run the model at a different resolution (f19_f19_mg17 rather than f09_f09_mg17) to no avail.

I have attached my current cesm error log, but can provide more files as needed.

Thanks for any insight!
Hi Haley,

Just wondering if you solve this issue as I have also encountered the same error! Appreciate any help!

Bests,
Skyler
 

jedwards

CSEG and Liaisons
Staff member
Looking at the cesm.log for Haley I see:
aero_model_mp_mod 2847 aero_model.F90
is this also where you are crashing Skyler? Please give details on how to reproduce the error.
 

skyler1

Skyler Yang
New Member
Here's the attached log file! I'm new to CESM and I was just trying to see if I could run the model!

I used resolution f19_g17 with compset B1850 in Niagara. I've also tried to set ulimit -s unlimited as some people suggested but it didn't help. Thank you for any suggestions!

bests,
Skyler
 

Attachments

  • cesm.log.13061525.240704-005518.zip
    2 KB · Views: 4

jedwards

CSEG and Liaisons
Staff member
It looks like you are failing in the mpi_init call - have you run some simple mpi tests to make sure that works?
 
Top