Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Performance issue during running a job in CAM 5.3

Hello, We have PGI CDK 13.9 compiler installed on LINUX(centos 6.2) cluster with SUN GRID ENGINE (SGE) job schduler. Also, We have 12 processor in each compute node(1 master node and 9 compute nodes). I am building the CAM with following way:


$/home/2012asz8344/cam/cam5/cesm1_0/models/atm/cam/bld/configure -dyn fv -hgrid 1.9x2.5 -ntasks 32 -nosmp  -fc  pgf90  -cc  pgcc -test >& config.log & $gmake -j8 $/home/2012asz8344/cam/cam5/cesm1_0/models/atm/cam/bld/build-namelist -test -config /home/2012asz8344/cam5/x1/bld/config_cache.xml >& bld.log & -----------------------------Job schduler SGE script for CAM--------------------------------------------------------------------------------------------------#!/bin/sh#$ -pe mpi 32#$ -cwd#$ -j y#$ -S /bin/bash#export PGI=/opt/pgicd /home/2012asz8344/cam5/x1/run/opt/pgi/linux86-64/2013/mpi/mpich/bin/mpirun -np 32  -machinefile /home/2012asz8344/list_node /home/2012asz8344/cam5/x1/bld/cam----------------------------------------------------------------------------------------------------------------------------------------------------------------  I am getting the performance issue in my cluster. When I am running the job in queue it uses three compute nodes with 32 cores. qstat command displaying that job is running on three nodes with 32 cores. But qhost  commad is displaying that job is using all memory usage of other compute nodes also. Some Cam processes are also running on other compute nodes which causes the performance issue in machine  Job is taking much time for execution. Also it decrease the cluster performance. I don't know why this is happening . Please do needful suggestion. Also if you need some more details please let me know.   Thanks and Regards: Ankush. 
 

eaton

CSEG and Liaisons
A cam5 run w/ FV, 1.9x2.5, and 32 tasks reported memory usage as 528-MB max and 344-MB min over the 32 tasks.  Assuming the average is under 500-MB per task, the total memory use on 1 node running 12 tasks should be less than 6-GB.  A rough performance guide is that on intel sandy bridge processors and making use of the hyperthreading, that model configuration runs at about 4 model years/day.  You'll need to work with your system administrators to understand how to run efficiently on your cluster.
 
Top