10.07.2015 Views

Hybrid MPI and OpenMP programming tutorial - Prace Training Portal

Hybrid MPI and OpenMP programming tutorial - Prace Training Portal

Hybrid MPI and OpenMP programming tutorial - Prace Training Portal

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Memory consumptionCase study: <strong>MPI</strong>+<strong>OpenMP</strong> memory usage of NPB• Shared nothing– Heroic theory– In practice: Some data is duplicated• <strong>MPI</strong> & <strong>OpenMP</strong>With n threads per <strong>MPI</strong> process:– Duplicated data may be reduced by factor nUsing more<strong>OpenMP</strong> threadscould reduce thememory usagesubstantially,up to five times onHopper Cray XT5(eight-core nodes).Hongzhang Shan, Haoqiang Jin, Karl Fuerlinger,Alice Koniges, Nicholas J. Wright:Analyzing the Effect of Different Programming Models UponPerformance <strong>and</strong> Memory Usage on Cray XT5 Platorms.Proceedings, CUG 2010, Edinburgh, GB, May 24-27, 2010.Always samenumber of cores<strong>Hybrid</strong> Parallel ProgrammingSlide 113 / 154Rabenseifner, Hager, Jost<strong>Hybrid</strong> Parallel ProgrammingSlide 114 / 154Rabenseifner, Hager, JostSlide, courtesy ofAlice Koniges, NERSC, LBLNMemory consumption(continued)How many threads per <strong>MPI</strong> process?• Future:With 100+ cores per chip the memory per core is limited.– Data reduction through usage of shared memorymay be a key issue– Domain decomposition on each hardware level• Maximizes– Data locality– Cache reuse• Minimizes– ccNUMA accesses– Message passing– No halos between domains inside of SMP node• Minimizes– Memory consumption• SMP node = with m sockets <strong>and</strong> n cores/socket• How many threads (i.e., cores) per <strong>MPI</strong> process?– Too many threads per <strong>MPI</strong> process overlapping of <strong>MPI</strong> <strong>and</strong> computation may be necessary, some NICs unused?– Too few threads too much memory consumption (see previous slides)• Optimum– somewhere between 1 <strong>and</strong> m x n threads per <strong>MPI</strong> process,– Typically:• Optimum = n, i.e., 1 <strong>MPI</strong> process per socket• Sometimes = n/2 i.e., 2 <strong>MPI</strong> processes per socket• Seldom = 2n, i.e., each <strong>MPI</strong> process on 2 sockets<strong>Hybrid</strong> Parallel ProgrammingSlide 115 / 154Rabenseifner, Hager, Jost<strong>Hybrid</strong> Parallel ProgrammingSlide 116 / 154Rabenseifner, Hager, Jost

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!