Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Segmentation fault tp_core.F90 running WACCM/CESM-DART 1.2.0

Dear allfor quite some time We are struggling to sucessfully run WACCM/CESM-DART 1.2.0. We keep getting the following error:Segmentation fault in .__tp_core_NMOD_xtpv at line 469 in file "/work/bb0519/CESM/cesm1_2_0/models/atm/cam/src/dynamics/fv/tp_core.F90"
  469               fxv(i,j) = mfxv(i,j)*qtmpv(iu,j)

Our setup is as follows:- IBM Power6 (blizzard.dkrz.de) with IBM/xlf13.1.0.8, IBM/xlc11.1.0.8, IBM/xlC11.1.0.8 (switching to the recommended but older compilerversions xlf 12.1 and xlC 10.1 didn't help)- 16 or 32 ensemble members for CAM (modifying the number of ensembles did not help either)- We currently thing the error is not related to data assimilation as the problem occurs even when no data is actually assimilated (i.e. the DART part is removed)- A standard CESM setup with different compsets works fine.Any hints/ideas what may possibly be the source of the error is appreciated as we are running out of ideas...
Sebastian  
 

santos

Member
Sorry for the slow response, as I was on vacation.I do not recognize this error; the FV dycore has not changed much in recent years, so it's difficult to think of any change that could have caused a regression. Here are my thoughts:
  • CESM 1.2.0 has a known bug that causes RRTMG to have biased results on big-endian architectures (this is fixed in 1.2.1). However, I do not believe that this is responsible for your error; it is not even relevant unless you've explicitly turned on RRTMG (or are running WACCM5). Even if you did have the RRTMG issue, you would be much more likely to have biased results than a crash.
  • We know that WACCM in CESM 1.1.X and earlier is on the edge of numerical stability. However, this rarely caused a problem and is (as far as we know) fixed in CESM 1.2.
  • Since I really have no great ideas, I just want to clarify: when you say that "a standard CESM setup" works fine, what do you mean specifically? Do you mean that a vanilla WACCM run works, but the runs you've set up for assimilation do not work (even if DART isn't actually being used)? Or that CAM4 works, but WACCM4 does not? My experience is mostly with WACCM per se; I'm afraid I don't have much insight into the current data assimilation efforts.
 
Dear Santos,

thanks for the reply. A standard CESM setup means that we just use one ensemble (coupled or uncoupled) by following the standard setup procedure e.g.

./create_newcase -case my_cesm_case -res f19_g16 -compset F55WCN -mach blizzard (blizzard is our machine)
cd
./cesm_setup
./my_cesm_case.build
and then run the model for several months without any problems.
I also contacted the dart developers at NCAR to see if they may have a solution. I will post here as soon as I have a reply. Thanks for your support anyways

Sebastian (scientific programmer @ GEOMAR Kiel)
 
Dear all

just to let you know that this problem is solved, it's related to the data assimilation via DART which caused the model to be instable. If someone needs to know more about this please contact me at swahl at geomar dot de
 
Top