Hello,
I'm in the process of porting CESM2 to the new Dutch supercomputer Snellius, which is comprised of AMD EPYC 7H12 processors. Each node has 2 CPU's with 64 cores each, making 128 cores per node in total. Nodes are tied together with Infiniband HDR100 (100Gbps), fat tree topology.
So far, the performance I'm seeing is disappointing, with a throughput that is worse than the previous Intel-based machine from 2015, using the same number of cores. I've been playing around with compiler flags etc. but these don't seem to make a whole lot of difference.
Is there anyone with experience with these kind of AMD systems? Given the large number of cores per node, should I aim for hybrid parallelization (OpenMPI + MPI) rather than using pure MPI , as I do now? If so, any advice on how to configure that would be helpful, so far I haven't been able to get this running with good results.
Leo
I'm in the process of porting CESM2 to the new Dutch supercomputer Snellius, which is comprised of AMD EPYC 7H12 processors. Each node has 2 CPU's with 64 cores each, making 128 cores per node in total. Nodes are tied together with Infiniband HDR100 (100Gbps), fat tree topology.
So far, the performance I'm seeing is disappointing, with a throughput that is worse than the previous Intel-based machine from 2015, using the same number of cores. I've been playing around with compiler flags etc. but these don't seem to make a whole lot of difference.
Is there anyone with experience with these kind of AMD systems? Given the large number of cores per node, should I aim for hybrid parallelization (OpenMPI + MPI) rather than using pure MPI , as I do now? If so, any advice on how to configure that would be helpful, so far I haven't been able to get this running with good results.
Leo