automatically exploiting cross-invocation parallelism using runtime ...

automatically exploiting cross-invocation parallelism using runtime ... automatically exploiting cross-invocation parallelism using runtime ...

dataspace.princeton.edu
from dataspace.princeton.edu More from this publisher
13.07.2015 Views

[62] L. Rauchwerger and D. Padua. The LRPD test: speculative run-time parallelizationof loops with privatization and reduction parallelization. ACM SIGPLAN Notices,volume 30, pages 218–232, 1995.[63] A. Robison, M. Voss, and A. Kukanov. Optimization via reflection on work stealingin TBB. In IEEE International Symposium on Parallel and Distributed Processing(IPDPS), 2008.[64] S. Rus, L. Rauchwerger, and J. Hoeflinger. Hybrid analysis: static & dynamic memoryreference analysis. Int. J. Parallel Program., volume 31, pages 251–283, August2003.[65] J. Saltz, R. Mirchandaney, and R. Crowley. Run-time parallelization and schedulingof loops. IEEE Transactions on Computers, volume 40, 1991.[66] S. Savage, M. Burrows, G. Nelson, P. Sobalvarro, and T. Anderson. Eraser: A dynamicdata race detector for multithreaded programs. ACM Transactions on ComputerSystems, volume 15, pages 391–411, 1997.[67] N. Shavit and D. Touitou. Software transactional memory. In Proceedings of the 14thannual ACM symposium on Principles of Distributed Computing (PODC), 1995.[68] M. F. Spear, M. M. Michael, and C. von Praun. RingSTM: scalable transactionswith a single atomic instruction. In Proceedings of the 20th annual Symposium onParallelism in Algorithms and Architectures (SPAA), 2008.[69] Standard Performance Evaluation Corporation (SPEC).http://www.spec.org/.[70] J. G. Steffan, C. Colohan, A. Zhai, and T. C. Mowry. The STAMPede approach tothread-level speculation. ACM Transactions on Computer Systems, volume 23, pages253–300, February 2005.104

[71] P. Swamy and C. Vipin. Minimum dependence distance tiling of nested loops withnon-uniform dependences. In Proceedings of the 6th IEEE Symposium on Paralleland Distributed Processing (IPDPS), 1994.[72] C.-W. Tseng. Compiler optimizations for eliminating barrier synchronization. In Proceedingsof the 5th ACM SIGPLAN symposium on Principles and Practice of ParallelProgramming (PPOPP), 1995.[73] T. Tzen and L. Ni. Trapezoid self-scheduling: a practical scheduling scheme for parallelcompilers. Parallel and Distributed Systems, IEEE Transactions on, volume 4,January 1993.[74] M. Weiser. Program slicing. In Proceedings of the 5th International Conference onSoftware Engineering (ICSE), 1981.[75] M. Wolfe. Doany: Not just another parallel loop. In Proceedings of the 4th workshopon Languages and Compilers for Parallel Computing (LCPC), 1992.[76] M. J. Wolfe. Optimizing Compilers for Supercomputers. PhD thesis, Department ofComputer Science, University of Illinois, Urbana, IL, October 1982.[77] L. Yen, J. Bobba, M. R. Marty, K. E. Moore, H. Volos, M. D. Hill, M. M. Swift, andD. A. Wood. LogTM-SE: Decoupling hardware transactional memory from caches.In Proceedings of the 13th IEEE international symposium on High Performance ComputerArchitecture (HPCA), 2007.[78] N. Yonezawa, K. Wada, and T. Aida. Barrier elimination based on access dependencyanalysis for openmp. In Parallel and Distributed Processing and Applications. 2006.[79] J. Zhao, J. Shirako, V. K. Nandivada, and V. Sarkar. Reducing task creation and terminationoverhead in explicitly parallel programs. In Proceedings of the 19th inter-105

[71] P. Swamy and C. Vipin. Minimum dependence distance tiling of nested loops withnon-uniform dependences. In Proceedings of the 6th IEEE Symposium on Paralleland Distributed Processing (IPDPS), 1994.[72] C.-W. Tseng. Compiler optimizations for eliminating barrier synchronization. In Proceedingsof the 5th ACM SIGPLAN symposium on Principles and Practice of ParallelProgramming (PPOPP), 1995.[73] T. Tzen and L. Ni. Trapezoid self-scheduling: a practical scheduling scheme for parallelcompilers. Parallel and Distributed Systems, IEEE Transactions on, volume 4,January 1993.[74] M. Weiser. Program slicing. In Proceedings of the 5th International Conference onSoftware Engineering (ICSE), 1981.[75] M. Wolfe. Doany: Not just another parallel loop. In Proceedings of the 4th workshopon Languages and Compilers for Parallel Computing (LCPC), 1992.[76] M. J. Wolfe. Optimizing Compilers for Supercomputers. PhD thesis, Department ofComputer Science, University of Illinois, Urbana, IL, October 1982.[77] L. Yen, J. Bobba, M. R. Marty, K. E. Moore, H. Volos, M. D. Hill, M. M. Swift, andD. A. Wood. LogTM-SE: Decoupling hardware transactional memory from caches.In Proceedings of the 13th IEEE international symposium on High Performance ComputerArchitecture (HPCA), 2007.[78] N. Yonezawa, K. Wada, and T. Aida. Barrier elimination based on access dependencyanalysis for openmp. In Parallel and Distributed Processing and Applications. 2006.[79] J. Zhao, J. Shirako, V. K. Nandivada, and V. Sarkar. Reducing task creation and terminationoverhead in explicitly parallel programs. In Proceedings of the 19th inter-105

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!