13.07.2015 Views

automatically exploiting cross-invocation parallelism using runtime ...

automatically exploiting cross-invocation parallelism using runtime ...

automatically exploiting cross-invocation parallelism using runtime ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

These techniques are referred to as speculative techniques. There have been many proposalsfor thread-level speculation (TLS) techniques which speculatively break various loopdependences [61, 42, 58]. Once these dependences are broken, effective parallelizationbecomes possible. Using the same loop in Figure 2.6, if TLS speculatively breaks the loopexit control dependences ( 2.7), assuming that the loop iterates many times, then the executionschedule shown in Figure 2.8(a) is possible. This parallelization offers a speedupof 4 over single threaded execution. Just as in TLS, by speculating the loop exit controldependence, the largest SCC is broken allowing SpecDSWP to deliver a speedup of 4 oversingle-threaded execution (as shown in Figure 2.8(b)).IE and speculative techniques take advantage of <strong>runtime</strong> information to more aggressivelyexploit <strong>parallelism</strong>. They are able to adapt to dependence patterns manifested at<strong>runtime</strong> by particular input data sets. Speculative techniques are best when dependencesrarely manifest as frequent misspeculation will lead to high recovery cost, which in turnnegates the benefit from parallelization. Non-speculative techniques, on the other hand, donot have the recovery cost, and thus are better choices for programs with more frequentmanifesting dependences.2.3 Cross-Invocation ParallelizationWhile intra-<strong>invocation</strong> parallelization techniques are adequate for programs with few loop<strong>invocation</strong>s, they are not adequate for programs with many loop <strong>invocation</strong>s. All of thesetechniques parallelize independent loop <strong>invocation</strong>s and use global synchronizations (barriers)to respect the dependences between loop <strong>invocation</strong>s.However, global synchronizationsstall all of the threads, forcing them to wait for the last thread to arrive at thesynchronization point, ca<strong>using</strong> inefficient utilization of processors and losing the potential<strong>cross</strong>-<strong>invocation</strong> <strong>parallelism</strong>. Studies [45] have shown that in real world applications, synchronizationscan contribute as much as 61% to total program <strong>runtime</strong>. There is an oppor-20

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!