Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

CESM Optimization in the distribution of computational resources

Kihang Youn

Kihang Youn
New Member
I am using two kinds of component resource allocation combinations.

Could you advise me what part I'm missing because the two allocation methods have a lot of performance difference in CPL COMM?

#A
1670826181922.png

#B
1670826205359.png

The difference between #A and #B is that B has fewer resources in OCN and more resources in ATM and ICE.

RESULT
1670826342741.png

Slightly slower on OCN and faster on ATM, ICE, and LND performed the same as given the resource.
However, it was confirmed that the CPL COMM was abnormally slow. I'm curious what caused this.

I'd like to get some guidance on how to allocate resources and try to do it.

Best Regards,
Kihang
 

Attachments

  • 1670826235967.png
    1670826235967.png
    11.3 KB · Views: 3

sacks

Bill Sacks
CSEG and Liaisons
Staff member
There is some guidance on load balancing here: 5. Controlling processors and threads — CIME master documentation

If I remember correctly, CPL COMM time often indicates time that one set of processors is waiting for another set of processors to finish running their component. When choosing a processor layout, it's important to know that, even though you can lay out any components to run on different processors, that doesn't mean that they can necessarily run concurrently due to constraints in the driver's component sequencing. So in case B, it might be that you have components running on different processors but that (because of driver sequencing constraints) need to wait for each other to finish. I think LND and ATM might have this issue.

I would suggest looking at the processor layouts on one of our supported machines, such as cheyenne (see CESM2 Timing, Performance & Load Balancing Data) to get a sense of what's typically done in terms of which components share processors vs. which are on separate processors in fully-coupled configurations. Then you can do some experimental timings on your own system to determine optimal processor counts for each component, following the guidance in the above documentation.
 
Top