10.07.2015 Views

Hybrid MPI and OpenMP programming tutorial - Prace Training Portal

Hybrid MPI and OpenMP programming tutorial - Prace Training Portal

Hybrid MPI and OpenMP programming tutorial - Prace Training Portal

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Content• Thread-safetyqualityof<strong>MPI</strong>libraries........... 120– <strong>MPI</strong> rules with <strong>OpenMP</strong> 121– Thread support of <strong>MPI</strong> libraries 124– Thread Support within Open<strong>MPI</strong> 125• Tools for debugging <strong>and</strong> profiling <strong>MPI</strong>+<strong>OpenMP</strong> . . 126– Intel ThreadChecker 127– Performance Tools Support for <strong>Hybrid</strong> Code 129• Other options on clusters of SMP nodes ........133– Pure <strong>MPI</strong> – multi-core aware 134– Hierarchical domain decomposition 135– Scalability of <strong>MPI</strong> to hundreds of thous<strong>and</strong>s 140– Remarks on Cache Optimization 141– Remarks on Cost-Benefit Calculation 142– Remarks on <strong>MPI</strong> <strong>and</strong> PGAS (UPC & CAF) 143• Summary...................................147– Acknowledgements 148– Summaries 149– Conclusions 154• Appendix...................................155– Abstract 156– Authors 157– References (with direct relation to thecontent of this <strong>tutorial</strong>) 160– Further references 165• Content ....................................172PracticalitiesAbout compilationRemember to enable <strong>OpenMP</strong> when compiling your <strong>OpenMP</strong> <strong>and</strong> hybridprograms! With PGI by having the –mp flag, with GNU the –fopenmp flag<strong>and</strong> so on.Pure <strong>OpenMP</strong> jobsYou can run <strong>OpenMP</strong> enabled codes on a multicore (Linux) laptop simplyby setting the environment variable OMP_NUM_THREADS equal to thenumber of threads you wish to execute, e.g. ”exportOMP_NUM_THREADS=8”, <strong>and</strong> running the program as usual.Louhi is not meant as a production platform for flat-<strong>OpenMP</strong> codes, but fortraining <strong>and</strong> testing purposes you can utilize Louhi as well. There anadditional -d option to the aprun launcher must be set <strong>and</strong> the number of<strong>MPI</strong> processes must be set to one:aprun –n 1 –d 8 ./my_omp.exe<strong>Hybrid</strong> <strong>MPI</strong>+<strong>OpenMP</strong> codes<strong>Hybrid</strong> programs are executed in the systems combining the execution ofa <strong>MPI</strong> job <strong>and</strong> an <strong>OpenMP</strong> one. On the Cray systems, the executioncomm<strong>and</strong>aprun –n 4 –d 8 ./my_hyb.exeWould launch an interactive job with 4x8=32 cores (4 <strong>MPI</strong> tasks eachhaving 8 threads).<strong>Hybrid</strong> Parallel ProgrammingSlide 173 / 154Rabenseifner, Hager, JostWhen executing through the batch job system, additional lines (ascompared to a flat <strong>MPI</strong> program) to the batch script are needed; cf. thissample jobscript#!/bin/bash#PBS -l walltime=00:15:00#PBS -l mppwidth=16#PBS -l mppdepth=8#PBS -l mppnppn=1cd $PBS_O_WORKDIRexport OMP_NUM_THREADS=8aprun -n 16 -N 1 -d 8 ./my_hyb.exeThis would allocate <strong>and</strong> execute a 16x8=128 core job.<strong>Hybrid</strong><strong>programming</strong>exercisesJacobi iterationThe Jacobi iterator is a way of solving the Poisson equation 2 V= byiteratively update the value of a 2D array V asVnew(i,j)=[V(i-1,j)+V(i+1,j)+V(i,j-1)+V(i,j+1)-(i,j)]/4Until convergence has been reached (i.e. V new <strong>and</strong> V old are sufficientlyclose to each other).<strong>Hybrid</strong><strong>programming</strong>exercisesJacobi solver hybridizedSee the Jacobi solvers parallelized with <strong>MPI</strong> based on 1D decompositionof the array (jacobi-mpi.f90 or .c). Starting from that program,implement an <strong>MPI</strong>+<strong>OpenMP</strong> hybrid Jacobi solver with three differentrealizations:1. Fine-grained version, where the halo exchange is performed outsidethe parallel region. This basically means just wrapping the updatesweep in a parallel do construct.2. Version where the master thread carries out the halo exchange, <strong>and</strong>all threads are alive throughout the program execution.3. Version employing multiple thread communication.Parallel Jacobi solverThe update of each element requires only information from the nearestneighbors, therefore if the whole domain can be decomposed to paralleltasks (in either one or two dimensions). Only the boundaries need to becommunicated, <strong>and</strong> one (<strong>MPI</strong>) task needs to communicate only with two(1D decomposition) or four (2D decomposition) other tasks. Thereforethe Jacobi solver isdo while (!converged)communicate boundaries (”exchange”)make the update above (”sweep”)check convergenceend doGet acquainted with the Jacobi solvers parallelized with <strong>MPI</strong> based on 1Ddecomposition of the array (jacobi-mpi.f90 or .c). The domain isdecomposed in row-wise (C) or column-wise blocks (Fortran) with theindex limits (which depend on the number of <strong>MPI</strong> tasks <strong>and</strong> the size ofthe domain) computed in the procedure “decomp”. Note that twoiterations are performed in one cycle of the update loop (to enable reuseof the arrays <strong>and</strong> more convenient checking for the convergence).Note, that the hybridization here essentially corresponds to a 2Ddecomposition.Sample answers are provided in files jacobi-hyb-a…c.f90 or .c.Cray specific remarkWhen using the Cray systems, you may need to adjust the thread safetyof the <strong>MPI</strong> library with the <strong>MPI</strong>CH_MAX_THREAD_SAFETY variable, e.g.setenv <strong>MPI</strong>CH_MAX_THREAD_SAFETY serializedor inserting a lineexport <strong>MPI</strong>CH_MAX_THREAD_SAFETY=serializedto the batch job script. The minimum thread safety level is ”single” forExercise 1, ”funneled” for Exercise 2 <strong>and</strong> ”multiple” for Exercise 3.In addition, you will need to link with -lmpich_threadm for multiplethread safety support.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!