rambhari0123@gmail_com
Member
Hello, We have perform parallel test run cesm1.2.0 (CAM5.3) as CAM standalone on Linux machine with PGI CDK 13.7 (Inbuild MPI2 library) compiler and parallely compiled NeTCDF4.2 libraries. We had performed 1 day, 30 days and 365 days test run for resolution 1.9x2.5 with varying the number of processors 1, 12, 24, 30, 36, 48 and 60.So, When we are running the model with 1, 12,24 and 30 cores the performance is as per expectations means compute time decreases .these test run taking by default decomposition. But when we are running the model with 36 48 and 60 cores the model the compute time incraeses and model taking more time to complete.We testsed different npr_yz combinations for 36 ,48 and 60 but we didn't find much improvment. So, here I ma attaching the Performance table for the various test runs of 1 day, 30 days and 365 days. We are not able to figure out why this compute time increses when we are increasing no. of cores. With less cores we are getting right compute time. Thus I also checked with help of the machine admin but we didn't find anything. So, I request you to please commnet on our test runs performance and suggest the needful to improve it. The Details about the machine specific i am using to run the model are as followings:No. of nodes -1 master node 9 compute nodes and 12 processor per node(hexacore), 24 GB RAM P/N. where Master Node -Fujitsu Primergy RX 300S7, Intel Xeon ES260@ 2GHz, 24GB RAM, 8TB HDD and compute node(0-8)- Fujitsu Primergy RX 200S7, Intel Xeon ES-2620 @ 2GHz,24GB RAM, 500GB HDD. it is Rocks Cluster 6 with Torque/PBS job scheduler and Compiler- PGI CDK 13.7. It is a Operating System Linux (CentOS 6.2). Thankyou for your anticipation. Regards:Ram IIT DELHI