13.07.2015 Views

automatically exploiting cross-invocation parallelism using runtime ...

automatically exploiting cross-invocation parallelism using runtime ...

automatically exploiting cross-invocation parallelism using runtime ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Algorithm 1: Pseudo-code for scheduler synchronizationInput: iterNum : global iteration numberaddrSet ← computeAddr(iterNum)tid ← schedule(iterNum, addrSet)foreach addr ∈ addrSet do< depTid, depIterNum > ← shadow[addr]if depIterNum ≠ −1 thenif depTid ≠ tid thenproduce(tid, < depTid, depIterNum >)shadow[addr] ← < tid, iterNum >produce(tid, < NO SYNC, iterNum >)thread finds no additional dependences, it will send (NO SYNC,i) to thread T1.3.2.3 Synchronizing IterationsWorkers receive synchronization conditions and coordinate with each other to respect dynamicdependences. A status array is used to assist this: latestFinished recordsthe latest iteration finished by each thread. A worker thread waiting on synchronizationcondition (depId, depIterNum) will stall until latestFinished[depId] ≥depIterNum. After each worker thread finishes an iteration, it needs to update its statusin latestFinished to allow threads waiting on it to continue executing.Synchronization conditions are forwarded <strong>using</strong> produce and consume primitivesprovided by a lock-free queue design [31], which provides an efficient way to communicateinformation between scheduler and worker threads.3.2.4 Walkthrough ExampleThe program CG illustrates DOMORE’s synchronization scheme. Figure 3.5(a) shows theaccess pattern (value j) in each iteration for two <strong>invocation</strong>s. Iterations are scheduled totwo worker threads in round-robin order. Figure 3.5(b) shows the change of the helper datastructures throughout the execution.30

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!