10.07.2015 Views

Hybrid MPI and OpenMP programming tutorial - Prace Training Portal

Hybrid MPI and OpenMP programming tutorial - Prace Training Portal

Hybrid MPI and OpenMP programming tutorial - Prace Training Portal

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Experiment: Matrix-vector-multiply (MVM)Masteronlyfunneled &reservedOverlapping: Using <strong>OpenMP</strong> tasksperformance ratio (r)funneled & reservedis fastermasteronlyis faster• Jacobi-Davidson-Solveron IBM SP Power3 nodeswith 16 CPUs per node• funneled&reserved isalways faster in thisexperiments• Reason:Memory b<strong>and</strong>widthis already saturatedby 15 CPUs, see inset• Inset:Speedup on 1 SMP nodeusing differentnumber of threadsSource: R. Rabenseifner, G. Wellein:Communication <strong>and</strong> Optimization Aspects of Parallel Programming Models on <strong>Hybrid</strong> Architectures.International Journal of High Performance Computing Applications, Vol. 17, No. 1, 2003, Sage Science Press .NEW <strong>OpenMP</strong> Tasking Model gives a new way to achieve more parallelismform hybrid computation.Alice Koniges et al.:Application Acceleration on Current <strong>and</strong> Future Cray Platforms.Proceedings, CUG 2010, Edinburgh, GB, May 24-27, 2010.Slides, courtesy of Alice Koniges, NERSC, LBNL<strong>Hybrid</strong> Parallel ProgrammingSlide97/154Rabenseifner, Hager, Jost<strong>Hybrid</strong> Parallel ProgrammingSlide98/154Rabenseifner, Hager, Jost— skipped —Case study: Communication <strong>and</strong> Computation inGyrokinetic Tokamak Simulation (GTS) shift routine— skipped —Overlapping can be achieved with <strong>OpenMP</strong> tasks (1 st part)Work on particle array (packing for sending, reordering, adding aftersending) can be overlapped with data independent <strong>MPI</strong>communication using <strong>OpenMP</strong> tasks.Slides, courtesy of Alice Koniges, NERSC, LBNL<strong>Hybrid</strong> Parallel ProgrammingSlide99/154GTS shift routineRabenseifner, Hager, JostOverlapping <strong>MPI</strong>_Allreduce with particle work• Overlap: Master thread encounters (!$omp master) tasking statements <strong>and</strong> createswork for the thread team for deferred execution. <strong>MPI</strong> Allreduce call is immediatelyexecuted.• <strong>MPI</strong> implementation has to support at least <strong>MPI</strong>_THREAD_FUNNELED• Subdividing tasks into smaller chunks to allow better load balancing <strong>and</strong> scalabilityamong threads.Slides, courtesy of Alice Koniges, NERSC, LBNL<strong>Hybrid</strong> Parallel ProgrammingSlide 100 / 154Rabenseifner, Hager, Jost

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!