13.07.2015 Views

automatically exploiting cross-invocation parallelism using runtime ...

automatically exploiting cross-invocation parallelism using runtime ...

automatically exploiting cross-invocation parallelism using runtime ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

3.4 Enable DOMORE in SPECCROSSDOMORE transformation partitions the sequential program into a scheduler thread andmultiple worker threads. The scheduler thread executes the sequential code region, computesthe dependences between inner loop iterations and schedules iterations correspondingly.This design provides a general solution to handle the sequential code enclosed bythe outer loop. There is no redundant computation and no need for special handling ofside-effecting operations. However, this design prohibits DOMORE to be integrated intoSPECCROSS framework which will be introduced in the next chapter. As a result, we tradethe benefits from having a separate scheduler thread for the applicability of DOMOREparallelization in SPECCROSS framework by duplicating the scheduler code to all workerthreads. Figure 3.9 demonstrates the parallel execution plan after the duplication. Onlyworker threads are spawned for the parallel execution. Each worker thread computes dependencesand schedules iterations independently. Each worker only executes the iterationsscheduled to it, but it executes all of the scheduler code to keep a record of the iterationdependences.Figure 3.10 shows the new generated code. Compared to the original code in Figure 3.8,the major differences include: (1) Only worker threads are spawned and each of them startsby executing the Scheduler function. If an iteration is scheduled to the executing workerthread, the worker function is invoked to do the actual work for that iteration. (2) Everyworker thread executes computeAddr, schedule and schedulerSync functionsindependently. To avoid access conflicts, each worker thread has its own shadow memoryand only updates that shadow memory. (3) Synchronization conditions are still producedto and consumed from the communication queues; while value dependences between theoriginal scheduler and the worker threads are passed on as function parameters instead.43

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!