automatically exploiting cross-invocation parallelism using runtime ...

automatically exploiting cross-invocation parallelism using runtime ... automatically exploiting cross-invocation parallelism using runtime ...

dataspace.princeton.edu
from dataspace.princeton.edu More from this publisher
13.07.2015 Views

4.5.4 Load Balancing Techniques . . . . . . . . . . . . . . . . . . . . . 754.5.5 Multi-threaded Program Checkpointing . . . . . . . . . . . . . . . 764.5.6 Dependence Distance Analysis . . . . . . . . . . . . . . . . . . . . 765 Evaluation 775.1 DOMORE Performance Evaluation . . . . . . . . . . . . . . . . . . . . . 785.2 SPECCROSS Performance Evaluation . . . . . . . . . . . . . . . . . . . . 825.3 Comparison of DOMORE, SPECCROSS and Previous Work . . . . . . . . 875.4 Case Study: FLUIDANIMATE . . . . . . . . . . . . . . . . . . . . . . . . 885.5 Limitations of Current Parallelizing Compiler Infrastructure . . . . . . . . 916 Conclusion and Future Direction 936.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 936.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94vi

List of Figures1.1 Scientists spend large amounts of time waiting for their program to generateresults. Among the 114 interviewed researchers from 20 differentdepartments in Princeton University, almost half of them had to wait days,weeks or even months for their simulation programs to finish. . . . . . . . . 31.2 Types of parallelism exploited in scientific research programs: one third ofthe interviewed researchers do not use any parallelism in their programs;others mainly use job parallelism or borrow already parallelized programs. 51.3 Example of parallelizing a program with barriers . . . . . . . . . . . . . . 71.4 Comparison between executions with and without barriers. A block withlabel x.y represents the y th iteration in the x th loop invocation. . . . . . . . 71.5 Contribution of this thesis work . . . . . . . . . . . . . . . . . . . . . . . . 92.1 Sequential Code with Two Loops . . . . . . . . . . . . . . . . . . . . . . . 132.2 Performance sensitivity due to memory analysis on a shared-memory machine. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.3 Intra-invocation parallelization techniques which rely on static analysis(X.Y refers to the Y th statement in the X th iteration of the loop): (a)DOALL concurrently executes iterations among threads and no inter-threadsynchronization is necessary; (b) DOANY applies locks to guarantee atomicexecution of function malloc; (c) LOCALWRITE goes through eachnode and each worker thread only updates the node belonging to itself. . . 14vii

4.5.4 Load Balancing Techniques . . . . . . . . . . . . . . . . . . . . . 754.5.5 Multi-threaded Program Checkpointing . . . . . . . . . . . . . . . 764.5.6 Dependence Distance Analysis . . . . . . . . . . . . . . . . . . . . 765 Evaluation 775.1 DOMORE Performance Evaluation . . . . . . . . . . . . . . . . . . . . . 785.2 SPECCROSS Performance Evaluation . . . . . . . . . . . . . . . . . . . . 825.3 Comparison of DOMORE, SPECCROSS and Previous Work . . . . . . . . 875.4 Case Study: FLUIDANIMATE . . . . . . . . . . . . . . . . . . . . . . . . 885.5 Limitations of Current Parallelizing Compiler Infrastructure . . . . . . . . 916 Conclusion and Future Direction 936.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 936.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94vi

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!