Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Run time issue with muliple couplers

Dear CCSM Software Development Group,
I had setup the following test by using create_test.
testcase: ER.01a
grid: T31_gx3v5
compset: B
machine : generic_linux

Configuration I am using is
-------------------------- Configuration ------------------------------
Dual Intel® Xeon® quad core E5365 (3 GHz, 1333 MHz FSB) cpu
Lustre based cluster file system
20Gbps 4x DDR Infiniband Interconnect
Linux (kernel version : 2.6.9-55.9hp.4sp.XCsmp)
mpich-1.2.7p1
Intel compilers (10.0.026)
LSF
-------------------------------------------------------------------------------------

default processor distribution is used
cpl - 2
csim - 8
clm - 8
pop - 24
cam - 16

When I use single coupler process model works fine, When I try to use two coupler processes model gets stuck during map map_Xr2o initialization. End portion of log is attached for your reference.

---------------cpl.log --------------------------------
(data_mapInit) initializing mapping data...
(cpl_map_init) initialize map: map_Sa2o
(cpl_map_init) scatter matrix by row...
(cpl_map_init) SM_ScatterByRow, rCode= 0
(cpl_map_init) scatter sMat return code = 0
(cpl_map_init) scattered sMat rows x cols = 11600 x 4608
(cpl_map_init) scattered sMat lSize 13860
(cpl_map_init) scattered sMat gNumEl 31612
(cpl_map_init) new gsMap gSize = 4608
(cpl_map_init) new gsMap lSize = 1633
(cpl_map_init) done initializing map: map_Sa2o
(cpl_map_init) initialize map: map_Fa2o
(cpl_map_init) scatter matrix by row...
(cpl_map_init) SM_ScatterByRow, rCode= 0
(cpl_map_init) scatter sMat return code = 0
(cpl_map_init) scattered sMat rows x cols = 11600 x 4608
(cpl_map_init) scattered sMat lSize 10466
(cpl_map_init) scattered sMat gNumEl 22526
(cpl_map_init) new gsMap gSize = 4608
(cpl_map_init) new gsMap lSize = 1579
(cpl_map_init) done initializing map: map_Fa2o
(cpl_map_init) initialize map: map_Fo2a
(cpl_map_init) scatter matrix by column...
(cpl_map_init) scatter sMat return code = 0
(cpl_map_init) scattered sMat rows x cols = 4608 x 11600
(cpl_map_init) scattered sMat lSize 10466
(cpl_map_init) scattered sMat GnumEl 22526
(cpl_map_init) new gsMap gSize = 4608
(cpl_map_init) new gsMap lSize = 1579
(cpl_map_init) done initializing map: map_Fo2a
(cpl_map_init) initialize map: map_So2a
(cpl_map_init) scatter matrix by column...
(cpl_map_init) scatter sMat return code = 0
(cpl_map_init) scattered sMat rows x cols = 4608 x 11600
(cpl_map_init) scattered sMat lSize 10466
(cpl_map_init) scattered sMat GnumEl 22526
(cpl_map_init) new gsMap gSize = 4608
(cpl_map_init) new gsMap lSize = 1579
(cpl_map_init) done initializing map: map_So2a
(cpl_map_init) initialize map: map_Xr2o
(cpl_map_init) scatter matrix by column...

--------------------------------------------------------------------------------------
Can anyone please give me some suggestions about how to proceed further to resolve this issue ?


Thanks & Regards,
Sandip.
 
Hi Sandip and List

Three guesses, with no warranty:

1) For what it is worth, the file "scripts/ccsm_utils/Machines/run.linux.generic_linux"
assumes one cpu per node, whereas you have eight cores per node.
Is this perhaps the cause of the problem?
(You may have changed this anyway.)

2) Also, you don't say how much memory/RAM per node you have.
If you have less than 1GB per core you may be running into
memory paging and context switching, which is not good for MPI.
In my experience, this is a RAMcore number for T42, and it varies depending on how
many tasks per node you have.
However, since you are running T31, you may fare well with less memory.
You can use "top" to check memory use.

3) Intel Xeon processors are said to have problems with memory bandwidth,
particularly when you go beyond four active cores on dual-socket quad-core machines.
There is people that say they use only four out of the eight available cores to avoid memory deadlocks.
Hence, one thing to try would be to run CCSM3 using four or less "CPUs" (i.e. cores) per node,
by modifying the file "run.linux.generic_linux" referred on item (1) above,
or your job submission script.

I hope this helps,
Gus Correa
 
Top