13.07.2015 Views

automatically exploiting cross-invocation parallelism using runtime ...

automatically exploiting cross-invocation parallelism using runtime ...

automatically exploiting cross-invocation parallelism using runtime ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

programs and it achieves a geomean speedup of 4.6× over the best sequential execution,which compares favorably to a 1.3× speedup obtained by parallel execution without any<strong>cross</strong>-<strong>invocation</strong> parallelization.6.2 Future DirectionsMost existing automatic parallelization techniques focus on loop level parallelization andignore the potential <strong>parallelism</strong> a<strong>cross</strong> loop <strong>invocation</strong>s. This limits the potential performancescalability especially when there are many loop <strong>invocation</strong>s ca<strong>using</strong> frequent synchronizations.A promising research direction is to extend the parallelization region beyondthe scope of a single loop <strong>invocation</strong>. This thesis work takes one step forward but there arestill numerous exciting avenues for future work.Interesting research could be done in designing and implementing efficient and adaptive<strong>runtime</strong> systems for region parallelization. A limitation of both the DOMORE and SPEC-CROSS <strong>runtime</strong> systems is that they do not scale well enough with increasing number ofthreads. The overhead in violation checking and iteration scheduling ultimately becomesthe performance bottleneck at high thread counts. One possible solution, as discussedabove, is to parallelizing the checker thread. Instead of assigning all checking tasks to asingle thread, multiple threads are used for checking concurrently. However, this optimizationbrings about other questions such as how to optimally allocate threads to workers andcheckers. The optimal trade-off could vary significantly for different <strong>runtime</strong> environment.An adaptive <strong>runtime</strong> system such as DoPE [59] could potentially be exploited to help theparallel execution adjust to the actual <strong>runtime</strong> environment and to achieve more scalableperformance. Additionally, DOMORE and SPECCROSS have a lot of configurable parametersthat are currently specified at compile time. The checkpointing frequency and thespeculative range are both set in advance <strong>using</strong> profiling information. However, profilinginformation is not necessarily consistent with actual execution. Ideally, these parameters94

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!