automatically exploiting cross-invocation parallelism using runtime ...
automatically exploiting cross-invocation parallelism using runtime ...
automatically exploiting cross-invocation parallelism using runtime ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
6050Non−ExclusiveExclusivePercentage (%)40302010027% 24% 16% 6% Job_ParallelismMessage_Passing9% 2% Threading7% 2% 3% 4% 4% Parallel_AnnotationGPU_based_ParallelismOthers34% NoneFigure 1.2: Types of <strong>parallelism</strong> exploited in scientific research programs: one third of theinterviewed researchers do not use any <strong>parallelism</strong> in their programs; others mainly use job<strong>parallelism</strong> or borrow already parallelized programs.<strong>parallelism</strong>.Often, iterations from different loop <strong>invocation</strong>s can execute concurrentlywithout violating program semantics. Instead of waiting, threads begin iterations fromsubsequent <strong>invocation</strong>s.A simple code example in Figure 1.3(a) can be used to demonstrate the benefits of<strong>exploiting</strong> <strong>cross</strong>-<strong>invocation</strong> <strong>parallelism</strong>. In this example, inner loop L1 updates array elementsin array A while inner loop L2 reads the elements from array A and uses the valuesto update array B. The whole process is repeated TIMESTEP times. Both L1 and L2 areDOALLable [1]. However, barriers must be placed between these two loops since datadependences exist a<strong>cross</strong> iterations of the two loops (e.g., iteration 1 of L2 depends oniteration 1 and 2 of L1).Figure 1.4(a) shows the execution plan for this program with barriers. Each block in5