automatically exploiting cross-invocation parallelism using runtime ...

automatically exploiting cross-invocation parallelism using runtime ... automatically exploiting cross-invocation parallelism using runtime ...

dataspace.princeton.edu
from dataspace.princeton.edu More from this publisher
13.07.2015 Views

and the number of checking requests for execution with 24 threads.The performanceresults (Figure 5.2) indicate that with higher thread counts, the checker thread may becomethe bottleneck. In particular, the performance of SPECCROSS scales up to 18 threads andeither flattens or decreases after that. The effects of checker thread in limiting performancecan be illustrated by considering the example of LLUBENCH. The number of checkingrequests for LLUBENCH increases by 3.3× when going from 8 threads to 24 threads,with the resulting performance improvements being minimal. Parallelizing dependenceviolation detection in the checker thread is one option to solve this problem and is part offuture work.Checkpointing is much more expensive than signature calculation or checking operationsand hence is done infrequently. For benchmark programs evaluated, there are lessthan 10 checkpoints, since SPECCROSS by default checkpoints every 1000 epochs. However,frequency of checkpointing can be reconfigured depending on desired performancecharacteristics. As a demonstration of the impact of checkpointing on performance, Figure5.3 shows the geomean speedup results of increasing the number of checkpoints from2 to 100, for all of the eight benchmark programs.In order to evaluate the overhead of the whole recovery process, we randomly triggereda misspeculation during the speculative parallel execution. Evaluation results are shown inFigure 5.3. As can be seen, more checkpoints increases the overhead at runtime, howeveralso reduce the time spent in re-execution once misspeculation happens. Finding an optimalconfiguration for them is important and will be part of the future work.84

12x11x10xSpecCrossPthread Barrier12x11x10xSpecCrossPthread Barrier9x9xLoop Speedup8x7x6x5x4xLoop Speedup8x7x6x5x4x3x3x2x2x1x1x0x2 4 6 8 10 12 14 16 18 20 22 24Number of Threads(a) CG0x2 4 6 8 10 12 14 16 18 20 22 24Number of Threads(b) EQUAKE12x11x10xSpecCrossPthread Barrier12x11x10xPthread BarrierSpecCross9x9xLoop Speedup8x7x6x5x4xProgram Speedup8x7x6x5x4x3x3x2x2x1x1x0x2 4 6 8 10 12 14 16 18 20 22 24Number of Threads(c) FDTD0x2 4 6 8 10 12 14 16 18 20 22 24Number of Threads(d) FLUIDANIMATE-212x11x10xSpecCrossPthread Barrier12x11x10xSpecCrossPthread Barrier9x9xLoop Speedup8x7x6x5x4xLoop Speedup8x7x6x5x4x3x3x2x2x1x1x0x2 4 6 8 10 12 14 16 18 20 22 24Number of Threads(e) JACOBI0x2 4 6 8 10 12 14 16 18 20 22 24Number of Threads(f) LLUBENCH12x11x10xSpecCrossPthread Barrier12x11x10xSpecCrossPthread Barrier9x9xLoop Speedup8x7x6x5x4xLoop Speedup8x7x6x5x4x3x3x2x2x1x1x0x2 4 6 8 10 12 14 16 18 20 22 24Number of Threads(g) LOOPDEP0x2 4 6 8 10 12 14 16 18 20 22 24Number of Threads(h) SYMMFigure 5.2: Performance comparison between code parallelized with pthread barrier andSPECCROSS.85

and the number of checking requests for execution with 24 threads.The performanceresults (Figure 5.2) indicate that with higher thread counts, the checker thread may becomethe bottleneck. In particular, the performance of SPECCROSS scales up to 18 threads andeither flattens or decreases after that. The effects of checker thread in limiting performancecan be illustrated by considering the example of LLUBENCH. The number of checkingrequests for LLUBENCH increases by 3.3× when going from 8 threads to 24 threads,with the resulting performance improvements being minimal. Parallelizing dependenceviolation detection in the checker thread is one option to solve this problem and is part offuture work.Checkpointing is much more expensive than signature calculation or checking operationsand hence is done infrequently. For benchmark programs evaluated, there are lessthan 10 checkpoints, since SPECCROSS by default checkpoints every 1000 epochs. However,frequency of checkpointing can be reconfigured depending on desired performancecharacteristics. As a demonstration of the impact of checkpointing on performance, Figure5.3 shows the geomean speedup results of increasing the number of checkpoints from2 to 100, for all of the eight benchmark programs.In order to evaluate the overhead of the whole recovery process, we randomly triggereda misspeculation during the speculative parallel execution. Evaluation results are shown inFigure 5.3. As can be seen, more checkpoints increases the overhead at <strong>runtime</strong>, howeveralso reduce the time spent in re-execution once misspeculation happens. Finding an optimalconfiguration for them is important and will be part of the future work.84

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!