Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

A interesting problem about the setting of job batch

Dear all,I have a interesting problem about the setting of job batch, I have three runing scenarios with B1850CN res:T31xg3v7 case.firstly, I setup all the values NTASKS_(ATM, LND ICE CPL, OCN) to 336 in the file env_mach_pes.xml.
correspondingly, I set up my job “queues” and “nodes” to “normal” and “336” in batch
file $case.$machine.run.in this scenarios gain the Model Throughput:      102.67   simulated_years/day. Secondly, I setup all the values NTASKS_(ATM, LND ICE CPL, OCN) to 600 in the file env_mach_pes.xml.
correspondingly, I set up my job “queues” and “nodes” to “super” and “600” in batch
file $case.$machine.run.In this scenarios gain the Model Throughput:     88   simulated_years/day. Thirdly, I setup all the values NTASKS_(ATM, LND ICE CPL, OCN) to 336  in the file env_mach_pes.xml as scenario one. However,
I set up my job “queues” and “nodes” to “super” and “600” in batch file $case.$machine.run.Interestly, in this scenarios gain the Model Throughput:       125  simulated_years/day. Becasuse our server have many reservation nodes for super batch jobs, which are always under
leisure time. In this way I can get started in the super queues quickly, in
spite of waste of computing resources. As you see in the third scenario the Model
Throughput increase 25% compare to first scenario.Finally,[/b] my question is: If I use this third scenario, I can get the model results normally?
I mean that will this computing layout configuration  influence model work normally? which I apply for 600cpus from job batch system, actually I only require 336 cpu in the cesm.exe. Will I get a pile of trash output results?
Does anyone know this issue? Thank you!
 

jedwards

CSEG and Liaisons
Staff member
This is a difficult question to answer because it is very dependent on the particular machine and I suspect you are using a machine I am not familiar with.   However it sounds like what you are doing is underpopulating each machine node with tasks.   This will reduce memory and network contengency and may increase throughtput.   There are several tests included with cesm to check that the results are consistant, see the users manual for a full description.    Do short runs with each configuration and compare results, I would expect the two cases using 336 tasks to be exactly the same.  
 
Top