10.07.2015 Views

Hybrid MPI and OpenMP programming tutorial - Prace Training Portal

Hybrid MPI and OpenMP programming tutorial - Prace Training Portal

Hybrid MPI and OpenMP programming tutorial - Prace Training Portal

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Numerical Optimization inside of an SMP node2nd level of domain decomposition: <strong>OpenMP</strong>3rd level: 2nd level cache4th level: 1st level cacheOptimizing thenumericalperformanceThe Mapping Problem with mixed modelDo we have this?SMP nodeSocket 1<strong>MPI</strong>Quad-core process4 xCPUmultithreadedNode Interconnect… or that?SMP nodeSocket 2<strong>MPI</strong>Quad-core process4 xCPUmultithreadedSocket 1<strong>MPI</strong> <strong>MPI</strong>processCPUcess0Quad-core pro-1Socket 2Quad-coreCPUNode Interconnectpure <strong>MPI</strong>&hybrid <strong>MPI</strong>+<strong>OpenMP</strong>Several multi-threaded <strong>MPI</strong>process per SMP node:Problem– Where are your processes<strong>and</strong> threads really located?Solutions:– Depends on your platform,– e.g., with numactl Case study onSun Constellation ClusterRangerwith BT-MZ <strong>and</strong> SP-MZFurther questions:– Where is the NIC 1) located?– Which cores share caches?<strong>Hybrid</strong> Parallel ProgrammingSlide89/154Rabenseifner, Hager, Jost<strong>Hybrid</strong> Parallel ProgrammingSlide90/154Rabenseifner, Hager, Jost1) NIC = Network Interface CardUnnecessary intra-node communicationProblem:– If several <strong>MPI</strong> process on each SMP node unnecessary intra-node communicationSolution:– Only one <strong>MPI</strong> process per SMP nodeRemarks:– <strong>MPI</strong> library must use appropriatefabrics / protocol for intra-node communication– Intra-node b<strong>and</strong>width higher thaninter-node b<strong>and</strong>width problem may be small– <strong>MPI</strong> implementation may causeunnecessary data copying waste of memory b<strong>and</strong>width<strong>Hybrid</strong> Parallel ProgrammingSlide91/154Rabenseifner, Hager, Jostpure <strong>MPI</strong>Mixed model(several multi-threaded <strong>MPI</strong>processes per SMP node)Quality aspectsof the <strong>MPI</strong> librarySleeping threads <strong>and</strong> network saturationwithfor (iteration ….){#pragma omp parallelnumerical code/*end omp parallel *//* on master thread only */<strong>MPI</strong>_Send (original datato halo areasin other SMP nodes)<strong>MPI</strong>_Recv (halo datafrom the neighbors)}/*endforloopMasteronly<strong>MPI</strong> only outside ofparallel regions<strong>Hybrid</strong> Parallel ProgrammingSlide92/154SMP nodeSocket 1MasterthreadSocket 2sleepingNode InterconnectRabenseifner, Hager, JostSMP nodeSocket 1MasterthreadSocket 2sleepingProblem 1:– Can the master threadsaturate the network?Solution:– If not, use mixed model– i.e., several <strong>MPI</strong>processes per SMP nodeProblem 2:– Sleeping threads arewasting CPU timeSolution:– Overlapping ofcomputation <strong>and</strong>communicationProblem 1&2 together:– Producing more idle timethrough lousy b<strong>and</strong>widthof master thread

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!