13.07.2015 Views

automatically exploiting cross-invocation parallelism using runtime ...

automatically exploiting cross-invocation parallelism using runtime ...

automatically exploiting cross-invocation parallelism using runtime ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

3.5 Related Work3.5.1 Cross-<strong>invocation</strong> ParallelizationLoop fusion techniques [22, 76] aggregate small loop <strong>invocation</strong>s into a large loop <strong>invocation</strong>,converting the problem of <strong>cross</strong>-<strong>invocation</strong> parallelization into the problem of <strong>cross</strong>iterationparallelization. The applicability of these techniques is limited to mainly affineloops due to their reliance upon static dependence analysis. Since DOMORE is a <strong>runtime</strong>technique, it is able to handle programs with input-dependent dynamic dependences.Tseng [72] partitions iterations within the same loop <strong>invocation</strong> so that <strong>cross</strong>-<strong>invocation</strong>dependences flow within the same working thread. Compared to DOMORE, this techniqueis much more conservative. DOMORE allows dependences to manifest between threadsand synchronizations are enforced only when real conflicts are detected at <strong>runtime</strong>.While manually parallelizing a sequential program, programmers can use annotationsprovided by BOP [19] or TCC [25] systems to specify the potential concurrent code regions.Those code regions will be speculatively executed in parallel at <strong>runtime</strong>. Bothtechniques can be applied to exploit <strong>cross</strong>-<strong>invocation</strong> <strong>parallelism</strong>. However, they requiremanual annotation or parallelization by programmers while DOMORE is a fully automaticparallelization technique.3.5.2 Synchronization OptimizationsOptimization techniques are proposed to improve the performance of parallel programswith excessive synchronizations (e.g, locks, flags and barriers).Fuzzy Barrier [24] specifies a synchronization range rather than a specific synchronizationpoint. Instead of waiting, threads can execute some instructions beyond the synchronizationpoint. Speculative Lock Elision [57] and speculative synchronizations [40] designhardware units to allow threads to speculatively execute a<strong>cross</strong> synchronizations. Grace [4]wraps code between fork and join points into transactions, removing barrier synchroniza-46

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!