13.07.2015 Views

automatically exploiting cross-invocation parallelism using runtime ...

automatically exploiting cross-invocation parallelism using runtime ...

automatically exploiting cross-invocation parallelism using runtime ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

for (int i = 0; i < framenum; i++) {L1: ClearParticles();L2: RebuildGrid();L3: InitDensitiesAndForces();L4: ComputeDensities();L5: ComputeDensities2();L6: ComputeForces();L7: ProcessCollisions();L8: AdvanceParticles();}Figure 5.5: Outermost loop in FLUIDANIMATEported in the Liberty parallelizing compiler infrastructure due to frequent manifesting <strong>cross</strong>iterationdependences and irregular memory access patterns. The outermost loop is composedof eight consecutive inner loops.Inner loop L4 and L6 can be parallelized byDOANY, LOCALWRITE or DOMORE, while all the other inner loops can be parallelizedby DOALL. As a result, a variety of parallelization plans can be applied.The manually parallelized version of FLUIDANMIATE in Parsec benchmark suite dividesthe shared data grids among threads. It applies DOANY to inner loop L4 and L6and DOALL to the other inner loops. Pthread barriers are inserted between two inner loop<strong>invocation</strong>s to respect <strong>cross</strong>-<strong>invocation</strong> dependences. To protect the shared data structure,DOANY applies locks to guarantee atomic accesses. Meanwhile, to avoid over synchronizationcaused by locks, it allocates an array of locks instead of <strong>using</strong> a single globallock. Each lock protects a section of the shared data structure. Threads accessing differentsections of the shared data do not have to synchronize at the same lock. Figure 5.6 demonstratesthe performance improvement of the manual parallelization. Since the manual parallelizationonly supports thread number that is to the power of 2, only three performanceresults are shown in the figure.Other than DOANY, inner loops L4 and l6 can also be parallelized by LOCALWRITE89

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!