13.07.2015 Views

automatically exploiting cross-invocation parallelism using runtime ...

automatically exploiting cross-invocation parallelism using runtime ...

automatically exploiting cross-invocation parallelism using runtime ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

1.1 Limitations of Existing ApproachesAlthough one could consider merely requiring the programmers to write efficient multithreadedcode to take advantage of the many processor cores, this is not a successful strategyfor several reasons. First, writing multi-threaded software is inherently more difficultthan writing single-threaded codes. To ensure correctness, programmers must reason aboutconcurrent accesses to shared data and insert sufficient synchronization to ensure data accessesare ordered correctly. Simultaneously, programmers must prevent excessive synchronizationsfrom rendering the multi-threaded program no better than its single-threadedcounterpart. Active research in automatic tools to identify deadlock, livelock, race conditions,and performance bottlenecks [14, 17, 21, 38, 66] in multi-threaded programs is atestament to the difficulty of achieving this balance.Second, there are many legacy applications that are single-threaded. Even if the sourcecode for these applications were available, it would take enormous programming effort totranslate these programs into well-performing parallel equivalents.Finally, even if efficient multi-threaded applications could be written for a particularmulti-core system, these applications may not perform well on other multi-core systems.The performance of multi-threaded applications is very sensitive to the particular systemfor which it was optimized. This variance is due to, among other factors, the relationbetween synchronization overhead and memory subsystem implementation and the relationbetween number of application threads and available hardware <strong>parallelism</strong>. For example,the size and number of caches, the coherence implementation and memory consistencymodel, the number of cores and threads per core and the cost of context switch could alllead to totally different parallelization decisions. Writing a portable application a<strong>cross</strong>multiple processors would prove extremely challenging.A recent survey [56] conducted by the Liberty Research Group suggested that mostscientists were making minimal effort to parallelize their software, often due to the complexitiesinvolved. This survey covered 114 randomly-selected researchers from 20 dif-2

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!