13.07.2015 Views

automatically exploiting cross-invocation parallelism using runtime ...

automatically exploiting cross-invocation parallelism using runtime ...

automatically exploiting cross-invocation parallelism using runtime ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Program Speedup6x5x4x3x2xLOCALWRITE+BarrierLOCALWRITE+SpecCrossDOMORE+BarrierDOMORE+SpecCrossMANUAL(DOANY+Barrier)1x0x2 4 6 8 10 12 14 16 18 20 22 24Number of ThreadsFigure 5.6: Performance improvement of FLUIDANIMATE <strong>using</strong> different techniques.and DOMORE. We apply both of them and compare these two implementations with themanual one. As shown in Figure 5.6, DOMORE + Barriers yields the best performanceamong these three. DOMORE does not have the overhead in redundant computation or theoverhead in locking. With small number of threads, LOCALWRITE + Barriers performsbetter than the manual parallelization, implying the overhead in redundant computation isless than the overhead in locking. However, since the redundancy problem deteriorateswith increasing number of threads, the manual implementation scales better than the LO-CALWRITE + Barrier one.All these three parallelization implementations use pthread barriers between inner loops,prohibiting potential <strong>parallelism</strong> a<strong>cross</strong> loop <strong>invocation</strong>s. SPECCROSS can be applied tofurther improve the performance. Figure 5.6 demonstrates the performance gain by <strong>using</strong>LOCALWRITE + SPECCROSS. LOCALWRITE + SPECCROSS is always better than90

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!