A interesting problem about the setting of job batch

subh08@lzu_edu_cn · Dec 22, 2015

Dear all,I have a interesting problem about the setting of job batch, I have three runing scenarios with B1850CN res:T31xg3v7 case.firstly, I setup all the values NTASKS_(ATM, LND ICE CPL, OCN) to 336 in the file env_mach_pes.xml.
correspondingly, I set up my job “queues” and “nodes” to “normal” and “336” in batch
file $case.$machine.run.in this scenarios gain the Model Throughput:      102.67   simulated_years/day. Secondly, I setup all the values NTASKS_(ATM, LND ICE CPL, OCN) to 600 in the file env_mach_pes.xml.
correspondingly, I set up my job “queues” and “nodes” to “super” and “600” in batch
file $case.$machine.run.In this scenarios gain the Model Throughput:     88   simulated_years/day. Thirdly, I setup all the values NTASKS_(ATM, LND ICE CPL, OCN) to 336 in the file env_mach_pes.xml as scenario one. However,
I set up my job “queues” and “nodes” to “super” and “600” in batch file $case.$machine.run.Interestly, in this scenarios gain the Model Throughput:       125  simulated_years/day. Becasuse our server have many reservation nodes for super batch jobs, which are always under
leisure time. In this way I can get started in the super queues quickly, in
spite of waste of computing resources. As you see in the third scenario the Model
Throughput increase 25% compare to first scenario.Finally,[/b] my question is: If I use this third scenario, I can get the model results normally?
I mean that will this computing layout configuration influence model work normally? which I apply for 600cpus from job batch system, actually I only require 336 cpu in the cesm.exe. Will I get a pile of trash output results?
Does anyone know this issue? Thank you!

jedwards · Dec 22, 2015

This is a difficult question to answer because it is very dependent on the particular machine and I suspect you are using a machine I am not familiar with. However it sounds like what you are doing is underpopulating each machine node with tasks. This will reduce memory and network contengency and may increase throughtput. There are several tests included with cesm to check that the results are consistant, see the users manual for a full description. Do short runs with each configuration and compare results, I would expect the two cases using 336 tasks to be exactly the same.

A interesting problem about the setting of job batch

subh08@lzu_edu_cn

New Member

jedwards

CSEG and Liaisons