10.07.2015 Views

Automatic parallelization and optimization for irregular ... - IEEE Xplore

Automatic parallelization and optimization for irregular ... - IEEE Xplore

Automatic parallelization and optimization for irregular ... - IEEE Xplore

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

consuming loops in his code, <strong>and</strong> split them up betweenthreads.OpenMP is a model <strong>for</strong> programming any parallel architecturethat provides the abstraction of a shared addressspace to the programmer. The purpose of OpenMP is to easethe development of portable parallel code, by enabling incrementalconstruction of parallel programs <strong>and</strong> hiding thedetail of the underlying hardware/software interface fromthe programmer. A parallel OpenMP program can be obtaineddirectly from its sequential counterpart, by adding<strong>parallelization</strong> directives.If the loop shown in Fig. 1 is split across OpenMPthreads then, although threads will always have distinct valuesof j, the values of prev_elam <strong>and</strong> next_elem maysimultaneously have the same values on different threads.As a result, there is a potential problem with updating thevalue of new_cell. There are some simple solutions tothis problem, which include making all the updates atomic,or having each thread compute temporary results which arethen combined across threads. However, <strong>for</strong> the extremelycommon situation of sparse array access neither of these approachesis particularly efficient.5.1 Directive Extension <strong>for</strong> IrregularLoopsThis section provides the new extensions to OpenMP, the<strong>irregular</strong> directive. The <strong>irregular</strong> directive couldbe applied to the parallel do directive in one of thefollowing situations:• When the parallel region is recognized as an <strong>irregular</strong>loop: in this case the compiler will invoke a runtimelibrary which partitions <strong>irregular</strong> loop according to aspecial computes rule.• When an ordered clause is recognized in the parallelregion where the loop is <strong>irregular</strong>: in this case thecompiler will treat this loop as a partial ordered; that is,some iterations are executed sequentially while someothers may be executed in parallel.• When an reduction clause is recognized in the parallelregion where the loop is <strong>irregular</strong>: in this case thecompiler will invoke an inspector/executor routines toper<strong>for</strong>m <strong>irregular</strong> reduction in parallel.The <strong>irregular</strong> directives in extended OpenMP versionmay have the following patterns:$!omp parallel do schedule(<strong>irregular</strong>,[irarray1,...,irarrayN])This case designates that the compiler will encounteran <strong>irregular</strong> loop, where irarray1,...,irarrayN arepossible indirection arrays, or$!omp parallel do reduction|ordered<strong>irregular</strong>([expr1,...,exprN])This case designates that the compiler will encounter aspecial <strong>irregular</strong> reduction or <strong>irregular</strong> ordered loop whereexpr1, ..., exprN are expressions such as loop indexvariables.5.2 Partially Ordered LoopsA special case <strong>for</strong> the example in Fig. 1 occurs when theshared updates need not only to be per<strong>for</strong>med in mutual exclusion,but also in an ordered way. The use of the <strong>irregular</strong>clause in this case tells the compiler that <strong>for</strong> thoseiterations which may update the same data in the differentthreads, they need to be executed in an ordered way. Otheriterations are still executed in parallel. An example of codeusing the indirect clause in this manner is the following:Example 1$!omp parallel do ordered <strong>irregular</strong>(x,y)do i = 1, nx[i] = indirect(1,i)y[i] = indirect(2,i)$!omp ordereda(x[i]) = a(y[i]) ......$!omp end orderedend do$!omp end parallel do6 ConclusionsThe efficiency of loop partitioning influences per<strong>for</strong>manceof parallel program considerably. For automaticallyparallelizing <strong>irregular</strong> scientific codes, the owner computesrule is not suitable <strong>for</strong> partitioning <strong>irregular</strong> loops. In thispaper, we have presented an efficient loop partitioning approachto reduce communication cost <strong>for</strong> a kind of <strong>irregular</strong>loop with nonlinear array subscripts. In our approach, runtimepreprocessing is used to determine the communicationrequired between the processors. We have developed thealgorithms <strong>for</strong> per<strong>for</strong>ming these communication <strong>optimization</strong>.We have also presented how interprocedural communication<strong>optimization</strong> can be achieved. Furthermore, if <strong>irregular</strong>codes are parallelized in OpenMP, we also proposedto extend OpenMP directives to be suitable <strong>for</strong> compiling<strong>and</strong> executing such codes. We have done a preliminary implementationof the schemes presented in this paper. Theexperimental results demonstrate efficacy of our schemes.Proceedings of the 18th International Parallel <strong>and</strong> Distributed Processing Symposium (IPDPS’04)0-7695-2132-0/04/$17.00 (C) 2004 <strong>IEEE</strong>

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!