Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Strange problem in running CAM 5.1.1

hefei@umich_edu

New Member
Hi, I am trying to run standalone CAM 5.1.1 with openmpi/intel-14.0.0 and intel-15.0.0. However, I got strange problems when doing that. The simulation can be done within 10 cores. However, if I tried to run it with 10+ cores. It failed with segmentation fault. May I ask you why or do you have a recommendation for the intel version to run CAM 5.0? Thank you very much!  The error message is as follow: *****************error message************************************* INITIALIZE_RADBUFFER: ntoplw =           1  pressure:   364.346569404006 Creating new decomp:           260203610720forrtl: severe (174): SIGSEGV, segmentation fault occurredImage              PC                Routine            Line        Sourcecam                0000000001851009  Unknown               Unknown  Unknowncam                000000000184F8DE  Unknown               Unknown  Unknown****************************************************************** Best regards,Fei
 

eaton

CSEG and Liaisons
CAM will produce solutions that are bit-for-bit identical independent of the the number of tasks.  So the first step is to verify that the solutions produced using 10 or fewer cores are identical (I'm assuming you are assigning 1 task per core.  If that's not the case you need to be specific about how you are assigning tasks).  If that's true then it would appear that your openmpi is working correctly and you are encountering some sort of system issue.  It is system dependent how the tasks are assigned to nodes.  Try using more nodes to see whether the problem is a memory issue.  Also experiment with assigning the 10 task case to different numbers of nodes.
 
Top