Hybrid MPI and OpenMP programming tutorial - Prace Training Portal
Hybrid MPI and OpenMP programming tutorial - Prace Training Portal
Hybrid MPI and OpenMP programming tutorial - Prace Training Portal
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Pure <strong>MPI</strong><strong>Hybrid</strong> Masteronlypure <strong>MPI</strong>one <strong>MPI</strong> processon each coreDiscussedin detail later onin the sectionMismatchProblemsAdvantages– No modifications on existing <strong>MPI</strong> codes– <strong>MPI</strong> library need not to support multiple threadsMajor problems– Does <strong>MPI</strong> library uses internally different protocols?• Shared memory inside of the SMP nodes• Network communication between the nodes– Does application topology fit on hardware topology?– Unnecessary <strong>MPI</strong>-communication inside of SMP nodes!Masteronly<strong>MPI</strong> only outsideof parallel regionsfor (iteration ….){#pragma omp parallelnumerical code/*end omp parallel *//* on master thread only */<strong>MPI</strong>_Send (original datato halo areasin other SMP nodes)<strong>MPI</strong>_Recv (halo datafrom the neighbors)}/*endforloopAdvantages– No message passing inside of the SMP nodes– No topology problemMajor Problems– All other threads are sleepingwhile master thread communicates!– Which inter-node b<strong>and</strong>width?– <strong>MPI</strong>-lib must support at least<strong>MPI</strong>_THREAD_FUNNELED SectionThread-safetyquality of <strong>MPI</strong>libraries<strong>Hybrid</strong> Parallel ProgrammingSlide9/154Rabenseifner, Hager, Jost<strong>Hybrid</strong> Parallel ProgrammingSlide10/154Rabenseifner, Hager, JostOverlapping Communication <strong>and</strong> ComputationOverlapping <strong>MPI</strong> communication bycommunication one or a few threads while other <strong>and</strong> threads computationare computingif (my_thread_rank < …) {<strong>MPI</strong>_Send/Recv….i.e., communicate all halo data}else{Execute those parts of the applicationthat do not need halo data(on non-communicating threads)}Execute those parts of the applicationthat need halo data(on all threads)Pure <strong>OpenMP</strong> (on the cluster)<strong>OpenMP</strong> onlydistributed virtualshared memory• Distributed shared virtual memory system needed• Must support clusters of SMP nodes• e.g., Intel ® Cluster <strong>OpenMP</strong>– Shared memory parallel inside of SMP nodes– Communication of modified parts of pagesat <strong>OpenMP</strong> flush (part of each <strong>OpenMP</strong> barrier)i.e., the <strong>OpenMP</strong> memory <strong>and</strong> parallelization modelis prepared for clusters!Experience: Mismatchsection<strong>Hybrid</strong> Parallel ProgrammingSlide11/154Rabenseifner, Hager, Jost<strong>Hybrid</strong> Parallel ProgrammingSlide12/154Rabenseifner, Hager, Jost