13.07.2015 Views

automatically exploiting cross-invocation parallelism using runtime ...

automatically exploiting cross-invocation parallelism using runtime ...

automatically exploiting cross-invocation parallelism using runtime ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

3.6 Running example for DOMORE code generation: (a) Pseudo IR for CGcode; (b) PDG for example code. Dashed lines represent <strong>cross</strong>-iteration and<strong>cross</strong>-<strong>invocation</strong> dependences for inner loop. Solid lines represent other dependencesbetween inner loop instructions and outer loop instructions. (c)DAG SCC for example code. DAG SCC nodes are partitioned into schedulerand worker threads. (d) and (e) are code generated by DOMORE MTCGalgorithm (3.3.2). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.7 (a) Example loop from Figure 3.1; (b) computeAddr function generatedfor this loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403.8 Generated code for example loop in CG. Non-highlighted code representsinitial code for scheduler and worker functions generated by DOMORE’sMTCG (Section 3.3.2). Code in grey is generated in later steps for iterationscheduling and synchronization. . . . . . . . . . . . . . . . . . . . . . . . 423.9 Execution plan for DOMORE before and after duplicating scheduler codeto worker threads. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443.10 Optimization for DOMORE technique: duplicating scheduler code on allworker threads to enable DOMORE in SPECCROSS framework. . . . . . . 454.1 Example program demonstrating the limitation of DOMORE transformation 504.2 Example of parallelizing a program with different techniques . . . . . . . . 524.3 Overhead of barrier synchronizations for programs parallelized with 8 and24 threads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534.4 Execution plan for TM-style speculation: each block A.B stands for theB th iteration in the A th loop <strong>invocation</strong>: iteration 2.1 overlaps with iterations2.2, 2.3, 2.4, 2.7, 2.8, thus its memory accesses need to be comparedwith theirs even though all these iterations come from the same loop <strong>invocation</strong>and are guaranteed to be independent. . . . . . . . . . . . . . . . . . 54ix

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!