Direct Numerical Simulation of Autoignition in a Jet in a Cross-Flow ...

13.07.2015 Views
Chapter 2Parallel Scaling and parallelefficiency metrics2.1 Measurement of Parallel EfficiencyParallel code scaling focuses on one of two forms: strong scaling or weak scaling.The goal of strong scaling is to reduce execution time for a fixed total problem sizeby adding processors (and hence reducing problem size per worker/MPI rank). Onthe other hand, ideal weak scaling behavior is to keep the execution time constantby adding processors in proportion to an increasingly larger problem size (andhence keeping problem size per worker fixed).Parallel efficiency,η, for a problem run on N 1 processes/MPI ranks is definedrelative to a reference run on N 2 ranks asη p = N 1 t(N 1 )N 2 t(N 2 ) ,where t(N) is the execution time on N ranks.In order to ensure core utilization we measure the MPI-threading efficiencywhich we define asη t = N 1 t(N 1 )N 2 t(N 2 ) ,where N 1 is the number of cores for the one MPI rank/thread per core configurationand N 2 = 2N 1 in the two threads/core and N 2 = 4N 1 in the four threads/coreconfigurations. Note that a net speedup is obtained for threading efficiency ofmore than 50% in the two threads/core configuration and more than 25% in thefour threads/core.2.1.1 Core-level MPI Threading EfficiencyInstantiation of two and four MPI-ranks per core is enabled on Mira and hence fullcore compute power utilization is assured through the use of more than one rank per4

Figure 2.1: MPI threading efficiency for autoignition simulation using NEK5000.Note that the abscissa represents grid size per MPI rank.core. However, this may lead to performance degradation for very large problemsize because of the resulting on-chip resource contention. We have conducted acore-level parallel efficiency measurements on Mira for the target simulations usingdifferent number of cores and either two or four ranks per code.Figure 2.1 compares the MPI threading efficiency for two and four ranks percore. It is clear that using two ranks per core is more efficient for same problemsize (i.e. grid points) per rank and that the optimum grid points per rank is 7,000-10,000. This size ensures maximum utilization of the core’s compute-power whileminimizing on-chip resource contention among MPI threads.2.1.2 Strong Scaling ExperimentA strong scaling is performed under the two MPI-ranks per core configuration foran autoignition simulation with a total of≈ 345 million grid points.. Figure 2.2shows that an ideal efficiency of a 100% is mainated up to approximately 130,000MPI-ranks (65,000 cores) and that up to 60% efficiency is sustainable using nearly500,0000 ranks.2.2 Profiling and Hardware Performance MonitoringPerformance monitoring is enabled through a mechanism for obtaining informationabout the use of MPI routines (profiling) or wall clock time and hardware counters5

Page 1 and 2: ANL/ALCF/ESP-13/3Direct Numerical S

Page 3 and 4: ANL/ALCF/ESP-13/3Direct Numerical S

Page 5 and 6: AbstractAutoignition in turbulent f

Page 7 and 8: Chapter 1IntroductionUnderstanding

Page 9: 1.1.1 Summary of Numerical Simulati

Page 13 and 14: • A utility hpmcount, which start

Page 15 and 16: is a trend of increasing access to

Page 17 and 18: BibliographyCohen, S. and Hindmarsh

efficiency

argonne

numerical

ranks

autoignition

scaling

simulation

hardware

utilization

laboratory

www.yumpu.com

Direct Numerical Simulation of Autoignition in a Jet in a Cross-Flow ...

Direct Numerical Simulation of Autoignition in a Jet in a Cross-Flow ... ... View more Direct Numerical Simulation of Autoignition in a Jet in a Cross-Flow ...

Delete template?

Save as template ?

Direct Numerical Simulation of Autoignition in a Jet in a Cross-Flow ... Direct Numerical Simulation of Autoignition in a Jet in a Cross-Flow ...