13.07.2015 Views

automatically exploiting cross-invocation parallelism using runtime ...

automatically exploiting cross-invocation parallelism using runtime ...

automatically exploiting cross-invocation parallelism using runtime ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

thread1 thread2 thread3 thread4thread1 thread2 thread3 thread4thread1thread2 thread3 thread4A1B1C1D1.1E1.1D1.5E1.5A2B2C2D2.1D1.2E1.2D2.2BarrierD1.3E1.3Barrier PenaltyIdle CoresD2.3D1.4E1.4D2.4A1B1C1D1.1D1.2D1.3D1.4D1.5A2B2C2D2.1D2.2D2.3D2.4D2.5E1.1E1.4E2.1E1.2E1.5BarrierE2.2E1.3BarrierPenaltyE2.3A1B1C1D1.1ScheduleD1.2ScheduleD1.3ScheduleD1.4ScheduleD1.5ScheduleA2B2C2D2.1ScheduleD2.2ScheduleD2.3ScheduleD2.4ScheduleD2.5ScheduleE1.1E1.2E1.4E1.5stallE2.2E2.3E1.3E2.1E2.1E2.2E2.3E2.4E2.5E2.5E2.4D2.5E2.5Barrier PenaltyIdle CoresE2.4BarrierPenaltyDOMOREfinalDOMOREafter partitioningDOALL(a)(b)(c)Figure 3.2: Comparison of performance with and without <strong>cross</strong>-<strong>invocation</strong> parallelization :(a) DOALL is applied to the inner loop. Frequent barrier synchronization occurs betweenthe boundary of the inner and outer loops. (b) After the partitioning phase, DOMORE haspartitioned the code without inserting the <strong>runtime</strong> engine. A scheduler and three workersexecute concurrently, but worker threads still synchronize after each <strong>invocation</strong>. (c) DO-MORE finalizes by inserting the <strong>runtime</strong> engine to exploit <strong>cross</strong>-<strong>invocation</strong> <strong>parallelism</strong>.Assuming iteration 2 from <strong>invocation</strong> 2 (2.2) depends on iteration 5 from <strong>invocation</strong> 1(1.5). Scheduler detects the dependence and synchronizes those two iterations.(Section 3.3.5). At <strong>runtime</strong>, the scheduler thread checks for dynamic dependences, schedulesinner loop iterations, and forwards synchronization conditions to worker threads (Section4.2). Worker threads use the synchronization conditions to determine when they areready to execute.27

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!