13.07.2015 Views

automatically exploiting cross-invocation parallelism using runtime ...

automatically exploiting cross-invocation parallelism using runtime ...

automatically exploiting cross-invocation parallelism using runtime ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

sequential_func() {for (t = 0; t < STEP; t++) {L1: for (i = 0; i < M; i++) {A[i] = calc_1(B[C[i]]);}L2: for (j = 0; j < M; j++) {B[j] = calc_2(A[D[j]]);}}}(a) Sequential Programnaïve_parallel_func() {for (t = 0; t < STEP; t++) {L1: for (i = TID; i < M; i = i + THREADNUM) {A[i] = calc_1(B[C[i]]);}L2: for (j = TID; j < M; j = j + THREADNUM) {B[j] = calc_2(A[D[j]]);}}}(b) Naïve Parallelization without Synchronizationsbarrier_parallel_func() {for (t = 0; t < STEP; t++) {L1: for (i = TID; i < M; i = i + THREADNUM) {A[i] = calc_1(B[C[i]]);}pthread_barrier_wait(&barrier);L2: for (j = TID; j < M; j = j + THREADNUM) {B[j] = calc_2(A[D[j]]);}pthread_barrier_wait(&barrier);}}(c) Parallelization with Barrier Synchronizationtransaction_parallel_func() {for (t = 0; t < STEP; t++) {L1: for (i = TID; i < M; i = i + THREADNUM) {TX_begin();A[i] = calc_1(B[C[i]]);TX_end();}L2: for (j = TID; j < M; j = j + THREADNUM) {TX_begin();B[j] = calc_2(A[D[j]]);TX_end();}}}(d) Parallelization with TransactionsFigure 4.2: Example of parallelizing a program with different techniquesamount of time threads sit idle waiting for the slowest thread to reach the barrier. For mostof these programs, barrier overhead accounts for more than 30% of the parallel executiontime and increases with the number of threads. This overhead translates into an Amdahl’slaw limit of 3.33× max program speedup.4.1.2 Speculative <strong>cross</strong>-<strong>invocation</strong> parallelizationThe conservativeness of barrier synchronization prevents any <strong>cross</strong>-<strong>invocation</strong> parallelizationand significantly limits performance gain. Alternatively, the optimistic approach unlockspotential opportunities for <strong>cross</strong>-<strong>invocation</strong> parallelization. It allows the threads toexecute a<strong>cross</strong> the <strong>invocation</strong> boundary under the assumption that the dependences rarelymanifest at <strong>runtime</strong>. Two major optimistic solutions have been proposed in literature sofar: one relies on transactional memory and the other uses speculative barriers.52

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!