13.07.2015 Views

automatically exploiting cross-invocation parallelism using runtime ...

automatically exploiting cross-invocation parallelism using runtime ...

automatically exploiting cross-invocation parallelism using runtime ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Algorithm 4: Final Code GenerationInput: program : original program IRInput: partition : Partition of scheduler and worker codeInput: parallelPlan : parallelization plan for inner loopInput: pdg : program dependence graphOutput: multi − threaded scheduler and worker programscheduler, worker ← MTCG(program, partition)scheduler ← generateSchedule(parallelPlan)computeAddr ← generateComputeAddr(worker, pdg)scheduler ← generateSchedulerSync()worker ← generateWorkerSync()be a bottleneck for the parallel execution, so the performance guard reports DOMOREis inapplicable. Figure 3.7 demonstrates the generated computeAddr function for theexample loop in Figure 3.1.3.3.5 Putting It TogetherAlgorithm 4 ties together all the pieces of DOMORE’s code-generation. The major stepsin the transformation are:1. The Multi-Threaded Code Generation algorithm (MTCG) discussed in Section 3.3.2generates the initial scheduler and worker threads based on the partition from Section3.3.1.2. The appropriate schedule function (Section 3.3.3) is inserted into the schedulerbased upon the parallelization plan for the inner loop.3. Create and insert the computeAddr (Algorithm 3) schedulerSync (Algorithm1), workerSync (Algorithm 2), functions into the appropriate thread to handledependence checking and synchronizing.Figure 3.8 shows the final code generated for CG.41

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!