13.07.2015 Views

automatically exploiting cross-invocation parallelism using runtime ...

automatically exploiting cross-invocation parallelism using runtime ...

automatically exploiting cross-invocation parallelism using runtime ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

applied to the outermost loop, generating a parallel program with the redundant code in thescheduler thread and each inner loop iteration is scheduled only to the appropriate ownerthread. Although DOMORE reduces the overhead of redundant computation, partitioningthe redundant code to the scheduler increases the size of the sequential region, whichbecomes the major factor limiting the scalability in this case.SYMM from the PolyBench [54] suite demonstrates the capabilities of a very simplemulti-grid solver in computing a three dimensional potential field. The target loop is athree-level nested-loop. DOALL applicable to the second level inner loop. As shownin the results, even after DOMORE optimization, the scalability of SYMM is poor. Themajor cause is that the execution time of each inner loop <strong>invocation</strong> only takes about 4,000clock cycles. With increasing number of threads, the overhead involved in multi-threadingoutweighs all performance gain.The performance of DOMORE is limited by the sequential scheduler thread at largethread counts. To address this problem, we could parallelize the computeAddr function.The algorithm proposed in [36] can be adopted to achieve that purpose. This will be thefuture work.80

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!