Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

CAM6 crashed in FHIST_BGC (compset) with f09_f09_mg17 grid

BoHuang

New Member
Dear all,

recently, I am working on development version of CAM6 with SE dyncore. I have worked on the released CESM2.1. My settings work fine. The settings also work in FHS94 compset with f09_f09_mg17 grid. I want use FHIST_BGC and f09_f09_mg17 (./create_newcase --case FHIST_BGC_cam6_2_017_f09_f09_mg17 --compset FHIST_BGC --machine fram --res f09_f09_mg17 --run-unsupported). The new case can be built, but it always crash when I submit the job. The error occurs in cesm.log (line 1025)as following:

.......
beg gridcell= 20850 end gridcell= 21013
total gridcells per clump= 164
proc= 127 clump no = 1 clump id= 128
beg landunit= 61548 end landunit= 62034
total landunits per clump = 487
proc= 127 clump no = 1 clump id= 128
beg column = 501960 end column = 505906
total columns per clump = 3947
proc= 127 clump no = 1 clump id= 128
beg patch = 793846 end patch = 800088
total patches per clump = 6243
proc= 127 clump no = 1 clump id= 128
beg cohort = 20850 end cohort = 21013
total cohorts per clump = 164
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
cesm.exe 0000000002CECD2D Unknown Unknown Unknown
libpthread-2.17.s 00002AF75BC765D0 Unknown Unknown Unknown
libiomp5.so 00002AF756D43FE5 Unknown Unknown Unknown
cesm.exe 0000000002D2E8F4 Unknown Unknown Unknown
cesm.exe 00000000021462BA watertracerutils_ 44 WaterTracerUtils.F90
cesm.exe 000000000211A86E waterdiagnosticty 105 WaterDiagnosticType.F90
cesm.exe 000000000211AB45 waterdiagnosticty 81 WaterDiagnosticType.F90
cesm.exe 0000000002117A7D waterdiagnosticbu 141 WaterDiagnosticBulkType.F90
cesm.exe 000000000214B2E1 watertype_mp_doin 338 WaterType.F90
cesm.exe 0000000001B715E3 clm_instmod_mp_cl 294 clm_instMod.F90
cesm.exe 0000000001B6A7AA clm_initializemod 440 clm_initializeMod.F90
cesm.exe 0000000001B549BF lnd_comp_mct_mp_l 234 lnd_comp_mct.F90
cesm.exe 0000000000436A66 component_mod_mp_ 257 component_mod.F90
cesm.exe 0000000000425F4C cime_comp_mod_mp_ 1346 cime_comp_mod.F90
cesm.exe 0000000000433B1E MAIN__ 122 cime_driver.F90
cesm.exe 0000000000414D6E Unknown Unknown Unknown
libc-2.17.so 00002AF75C1A7495 __libc_start_main Unknown Unknown
cesm.exe 0000000000414C69 Unknown Unknown Unknown
forrtl: severe (174): SIGSEGV, segmentation fault occurred

.....

enclosed the log files. Adam Herrington already try the combination of compset and grid. It works on NCAR HPC. I am not sure how to fix this error. Is there anyone have suggestion/tips? Thank you very much.

Regards,
Bo
 

Attachments

  • atm.log.505977.200411-073735.txt
    362.4 KB · Views: 3
  • cesm.log.505977.200411-073735.txt
    251.5 KB · Views: 8
  • cpl.log.505977.200411-073735.txt
    43.8 KB · Views: 1
  • lnd.log.505977.200411-073735.txt
    36.4 KB · Views: 3

nusbaume

Jesse Nusbaumer
CSEG and Liaisons
Staff member
Hi Bo,

It looks like the model is dying due to an error in CLM, so I am moving this post over to the CLM forum, in case someone with more expertise in that component would like to respond.

That being said, we don't technically support developmental versions on these forums, so I can't promise that anyone will be able to solve your issue. If you feel comfortable, I would recommend just following the error trace yourself through the source code (e.g. look at line 44 in WaterTracerUtils.F90) to see if you can find what the specific error is, and then try fixing it yourself. If you aren't using water tracers in CLM, then this could simply mean commenting out the offending sections in the model source code.

Finally, you can also try setting "DEBUG" to "TRUE" in "env_build.xml" to see if the compiler you are using can provide a more specific (and easier to fix) error.

Good luck, and have a great day!

Jesse
 

BoHuang

New Member
Hi Jesse,

thank you very much for the information. I try to track the error. It seems the WaterTracerUtils.F90 is new for the developmental version of CLM5. It is not been included in the released CLM5 in CESM2. I will contact to the CLM community. Hope they have suggestion to fix this error.

Have nice weekend!

Greetings,
Bo
 

sacks

Bill Sacks
CSEG and Liaisons
Staff member
What compiler and compiler version are you using? Someone ran into a very similar issue last August with intel 18.0.3. They upgraded to intel 19 and the problem was resolved. If you are using intel 18, is it feasible for you to use a newer version of the intel compiler?

Another possibility is that it's running out of memory. How many processors are you using? Have you tried increasing the land processor count and seeing if the problem either disappears or changes?
 

erik

Erik Kluzek
CSEG and Liaisons
Staff member
Hi Bo

A seg-fault is a general error that can happen for many different reasons. So it needs more help to diagnose what's happening. Bill and Jesse's suggestions are both things that you should do. You could also try running just a I compset at that resolution to see if you see it without CAM's involvement.
 

BoHuang

New Member
Hi Bill and Erik,

thank you very much for your suggestion. The HPC IT support help me build the new netcdf-fortran and Pnetcdf library with intel/2019 compiler. Now, the segmentation fault issue for water tracer is gone. The CAM6 with SE dyncore test also seems fine. I can continue the further test.

Regards,
Bo
 
Top