ag4680@nyu_edu
New Member
Dear Scientists, I am using version E155010 of the CAM Trunk on Cheyenne. I am experiencing unexpected segmentation faults with both the CAM-Spectral Element and CAM-Finite Volume Dynamical cores on the cluster. The problem is much much more frequent in CAM-FV (and quite infrequent in CAM-SE). Surprisingly, when I restart the code from the recently save restart file again, it runs fine most of the time - which suggests that it might not be a physics error. Usually, if there is a physics error, it can be easily deduced from the error message.To provide context, I am integrating the CAM-FV dycore on the f09_f09 grid with 80 vertical levels on 24 Cheynne nodes. On average it crashes once every 0.4-0.6 model years. I tnever runs for more than 1 year without presenting a seg fault. I have attached the CESM log files, one each for CAM-FV and CAM-SE. I have noticed that the crashes are quite infrequent when I run the cores only on 1 or 2 Cheyenne nodes - and crashes more often as more nodes are employed to run the dynamical cores.I have tried compiling the code with different compilers (Ref : Dr. Isla Simpson, NCAR). I get the same seg fault with both intel and gnu compilers. Moreover, the pgi compiler specification conflicts with the '--mpilib openmpi' specification and does not allow the creation of a case in the first place. Can you provide a clue on how to fix these mysterious seg faults? Any help in this direction will be greatly appreciated. PFA the log files. Regards,Aman Gupta