13.07.2015 Views

automatically exploiting cross-invocation parallelism using runtime ...

automatically exploiting cross-invocation parallelism using runtime ...

automatically exploiting cross-invocation parallelism using runtime ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Algorithm 3: Pseudo-code for generating the computeAddr function from the workerfunctionInput: worker : worker function IRInput: pdg : program dependence graphOutput: computeAddr : computeAddr function IRdepInsts ← getCrossMemDepInsts(pdg)depAddr ← getMemOperands(depInsts)computeAddr ← reverseProgramSlice(worker, depAddr)for (i = 0; i < N; i++) {start = A[i];end = B[i];for (j = start; j < end; j++) {update (&C[j]);}}(a)vector computeAddr(int iternum) {vector addrSet;int addr = (long)&C[j];addrSet.push_back(addr);return addrSet;}(b)Figure 3.7: (a) Example loop from Figure 3.1; (b) computeAddr function generated forthis loop3.3.4 Generating the computeAddr functionThe scheduler thread uses the computeAddr function to determine which addresses willbe accessed by worker threads. DOMORE <strong>automatically</strong> generates the computeAddrfunction from the worker thread function <strong>using</strong> Algorithm 3. The algorithm takes as inputthe worker thread’s IR in SSA form and a program dependence graph (PDG) describingthe dependences in the original loop nest. The compiler uses the PDG to find all instructionswith memory dependences a<strong>cross</strong> the inner loop iterations or <strong>invocation</strong>s. Theseinstructions will consist of loads and stores. In the worker thread, program slicing [74] isperformed to create the set of instructions required to generate the address of the memorybeing accessed. Presently, the DOMORE transformation does not handle computeAddrfunctions with side-effects. If program slicing duplicates instructions with side-effects,the DOMORE transformation aborts. After the transformation, a performance guard comparesthe weights of the computeAddr function and the original worker thread. If thecomputeAddr function is too heavy relative to the original worker, the scheduler would40

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!