13.07.2015 Views

automatically exploiting cross-invocation parallelism using runtime ...

automatically exploiting cross-invocation parallelism using runtime ...

automatically exploiting cross-invocation parallelism using runtime ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

example which cannot benefit from either techniques. This loop is almost identical to theloop in Figure 2.4(a) except this loop can exit early if the computed cost exceeds a threshold.Since all the loop statements participate in a single dependence cycle (they form asingle strongly-connected component in the dependence graph), DSWP is unable to parallelizethe loop. Similarly, the dependence height of the longest cycle in the dependencegraph is equal to the dependence height of the entire loop iteration rendering DOACROSSineffective as well.To overcome the limitation caused by the conservative nature of static analysis, othertechniques are proposed to take advantage of <strong>runtime</strong> information. Among these techniques,some observe dependences at <strong>runtime</strong> and schedule loop iterations correspondingly.Synchronizations are inserted to respect a dependence between two iterations only if thatdependence manifests at <strong>runtime</strong>. Inspector-Executor (IE) [53, 60, 65] style parallelizationtechniques are representative of this category of techniques. IE consists of three phases: inspection,scheduling, and execution. A complete dependence graph is built for all iterationsduring the inspecting process. By topological sorting the dependence graph, each iterationis assigned to a wavefront number for later scheduling. At <strong>runtime</strong>, iterations with the samewavefront number can execute concurrently while iterations with larger wavefront numbershave to wait till those with smaller wavefront numbers to finish. The applicability of IE islimited by the possibility of constructing an inspector loop at compile time. This inspectorloop goes through all memory addresses being accessed in each iteration and determinesthe dependences between iterations. Since inspector loop is duplicated from the originalloop, it is required not to cause any side effect (e.g., update the shared memory). Becauseof these limitations, example loop in Figure 2.6 cannot be parallelized by IE since withoutactually updating the values of node, we won’t know whether the loop will exit.IE checks dependences at <strong>runtime</strong> before it enables any concurrent execution. Anothergroup of techniques which also take advantage of <strong>runtime</strong> information allow concurrentexecution of potentially dependent loop iterations even before they check the dependences.18

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!