automatically exploiting cross-invocation parallelism using runtime ...
automatically exploiting cross-invocation parallelism using runtime ...
automatically exploiting cross-invocation parallelism using runtime ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
[24] R. Gupta. The fuzzy barrier: a mechanism for high speed synchronization of processors.In Proceedings of the 3rd international conference on Architectural Support forProgramming Languages and Operating Systems (ASPLOS), 1989.[25] L. Hammond, V. Wong, M. Chen, B. D. Carlstrom, J. D. Davis, B. Hertzberg, M. K.Prabhu, H. Wijaya, C. Kozyrakis, and K. Olukotun. Transactional memory coherenceand consistency. In Proceedings of the 31st annual International Symposium onComputer Architecture (ISCA), 2004.[26] H. Han and C.-W. Tseng. Improving compiler and run-time support for irregularreductions <strong>using</strong> local writes. In Proceedings of the 11th international workshop onLanguages and Compilers for Parallel Computing (LCPC), 1999.[27] M. Herlihy and J. E. B. Moss. Transactional memory: Architectural support for lockfreedata structures. In Proceedings of the 20th annual International Symposium onComputer Architecture (ISCA), 1993.[28] J. Huang, T. B. Jablin, S. R. Beard, N. P. Johnson, and D. I. August. Automatically<strong>exploiting</strong> <strong>cross</strong>-<strong>invocation</strong> <strong>parallelism</strong> <strong>using</strong> <strong>runtime</strong> information. In Proceedingsof the 2013 International Symposium on Code Generation and Optimization, April2013.[29] J. Huang, A. Raman, Y. Zhang, T. B. Jablin, T.-H. Hung, and D. I. August. DecoupledSoftware Pipelining Creates Parallelization Opportunities. In Proceedings of the 8thinternational symposium on Code Generation and Optimization (CGO), 2010.[30] K. Z. Ibrahim and G. T. Byrd. On the exploitation of value predication and produceridentification to reduce barrier synchronization time. In Proceedings of the 15th InternationalParallel & Distributed Processing Symposium (IPDPS), 2001.99