The ROSE Source-to-Source Compiler Infrastructure - Cetus
The ROSE Source-to-Source Compiler Infrastructure - Cetus
The ROSE Source-to-Source Compiler Infrastructure - Cetus
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Low level interfaces can give users the maximum freedom <strong>to</strong> manipulate<br />
some details in AST trees. We also create a wide range of<br />
transformation and optimization functions <strong>to</strong> demonstrate the use<br />
of the APIs. Users can further reuse these existence functions <strong>to</strong><br />
build more complex or cus<strong>to</strong>mized transformations and optimizations.<br />
<strong>The</strong> low level AST interface exposes member functions of different<br />
types of objects in AST, including the tree nodes, symbol<br />
tables, attributes, preprocessing information, file location information,<br />
and so on. This level of interface provides users the maximum<br />
freedom <strong>to</strong> change the AST. But using such a low level interface is<br />
a tedious and error-prone process since users have <strong>to</strong> manually create<br />
the edges between nodes and maintain the consistence between<br />
the tree and the corresponding symbol tables. An AST consistency<br />
test function is provided <strong>to</strong> check the transformed AST is complete<br />
and consistent.<br />
<strong>The</strong> high level AST interface contains a set of functions <strong>to</strong> build<br />
AST subtrees and insert them in<strong>to</strong> existing AST. A lot of helper<br />
functions are also provided <strong>to</strong> walk the tree, query nodes, replace,<br />
delete, or even copy subtrees. Using the high level interface, users<br />
only need <strong>to</strong> provide essential source code information as parameters<br />
<strong>to</strong> these functions. All details of edges and symbol tables are<br />
au<strong>to</strong>matically and transparently handled by the interface. <strong>The</strong> high<br />
level interface also support different order of AST constructions.<br />
Users can either build parent nodes first or build children nodes<br />
first. Or statements using some variables can be even build before<br />
the declaration statements are created. <strong>The</strong> high level interface is<br />
able <strong>to</strong> au<strong>to</strong>matically patch up AST and symbol tables once enough<br />
information is available.<br />
We have developed a wide range of transformations and optimizations<br />
in <strong>ROSE</strong>, including partial redundancy elimination, constant<br />
folding, inlining, outlining (separating out a portion of code<br />
as a function), OpenMP 3.0 implementation [2], au<strong>to</strong>matic parallelization<br />
[3] and loop transformations, such as fusion, fission, interchange,<br />
unrolling, and blocking. For example, the <strong>ROSE</strong> outliner<br />
[1] can extract code portions of C, C++ and Fortran in<strong>to</strong> functions.<br />
It supports the classic way of generating one function parameter<br />
for each variable <strong>to</strong> be passed in or out of the generated function.<br />
It also can optionally wrap multiple variables in<strong>to</strong> an array<br />
and use the array as a single function parameter. More importantly,<br />
the <strong>ROSE</strong> outliner differentiates the usage of parameters in<strong>to</strong> useby-value<br />
and use-by-address. So the classic outlining algorithm’s<br />
excessive pointer dereferences (used <strong>to</strong> access written parameters)<br />
can be largely eliminated if the accesses <strong>to</strong> the written parameters<br />
are substituted by their value clones. Other analyses, such as liveness<br />
analysis and scope analysis are also used <strong>to</strong> further improve<br />
the quality of the outlining so the generated function preserves the<br />
original semantics and performance characteristics.<br />
5. Enabling Exascale Research<br />
<strong>ROSE</strong> is being used by several DOE projects <strong>to</strong> address the challenges<br />
of exascale computing. One of the projects, Thrify, is using<br />
<strong>ROSE</strong> <strong>to</strong> explore novel compiler analysis and optimization <strong>to</strong><br />
take advantage of an exascale architecture with many-core chips<br />
specially designed <strong>to</strong> improve performance per watt. In another<br />
project, CODEX, an alternative simula<strong>to</strong>r is used <strong>to</strong> support the<br />
analysis of kernels from DOE applications using other proposed<br />
exascale node architectures and possible future network connection<br />
<strong>to</strong>pologies and hardware. In still another project, <strong>ROSE</strong> has<br />
been used <strong>to</strong> analyze example applications and extract the MPI usage<br />
as an input code <strong>to</strong> support evaluation of message passing <strong>to</strong><br />
support the network communication simulation. This work is part<br />
of a project <strong>to</strong> au<strong>to</strong>mate how co-design can be applied <strong>to</strong> support<br />
the evaluation of DOE applications.<br />
Watt<br />
250<br />
200<br />
150<br />
100<br />
50<br />
Power (Watt) MFLOPS/Watt<br />
0<br />
0<br />
1 2 4 6 8 10 12 14 16<br />
Number of threads<br />
Figure 2. OpenMP transformations using <strong>ROSE</strong> and their evaluation<br />
using a simula<strong>to</strong>r on a numerical kernel (Jacobi Iteration) for both<br />
power and MFLOP/Watt.<br />
Figure 2 shows the evaluation of both power and MFLOP per<br />
Watt using a node simula<strong>to</strong>r and for different numbers of threads.<br />
All transformations were done using <strong>ROSE</strong>. <strong>The</strong> simula<strong>to</strong>r represents<br />
a proposed future node architecture.<br />
6. Results<br />
Numerous <strong>to</strong>ols are available in <strong>ROSE</strong>, such as<br />
1. Au<strong>to</strong> parallelization <strong>to</strong>ols for HPC,<br />
2. OpenMP compiler,<br />
3. Exascale Co-design skele<strong>to</strong>n genera<strong>to</strong>r,<br />
4. <strong>ROSE</strong>-CIRM (Runtime error detection support),<br />
5. Tools for the analysis of MPI,<br />
6. Tools <strong>to</strong> enforce data integrity for Exascale architectures.<br />
7. Binary emulation <strong>to</strong> support general analysis,<br />
8. Compass static analysis <strong>to</strong>ol,<br />
9. Data structure visualization (<strong>to</strong> support debugging) and<br />
10. Program Visualization.<br />
Separate from the <strong>to</strong>ols provided in <strong>ROSE</strong>, and the <strong>to</strong>ols that<br />
external groups have built using <strong>ROSE</strong>, <strong>ROSE</strong> is itself a <strong>to</strong>ol for<br />
building software analysis and optimization <strong>to</strong>ols. Is is used as a<br />
basis for both graduate courses in a wide range of <strong>to</strong>pics from<br />
compiler construction and analysis <strong>to</strong> software assurance. <strong>ROSE</strong> is<br />
released under a BSD license <strong>to</strong> support as low a barrier <strong>to</strong> adoption<br />
as possible for industrial use.<br />
References<br />
[1] Chunhua Liao, Daniel J. Quinlan, Richard Vuduc, and Thomas Panas.<br />
Effective source-<strong>to</strong>-source outlining <strong>to</strong> support whole program empirical<br />
optimization. In <strong>The</strong> 22th International Workshop on Languages<br />
and <strong>Compiler</strong>s for Parallel Computing (LCPC), Newark, Delaware,<br />
USA, 2009.<br />
[2] Chunhua Liao, Daniel J. Quinlan, Thomas Panas, and Bronis R. de<br />
Supinski. A <strong>ROSE</strong>-based OpenMP 3.0 research compiler supporting<br />
multiple runtime libraries. In Mitsuhisa Sa<strong>to</strong>, Toshihiro Hanawa,<br />
Matthias S. Müller, Barbara M. Chapman, and Bronis R. de Supinski,<br />
edi<strong>to</strong>rs, IWOMP, volume 6132 of LNCS, pages 15–28. Springer, 2010.<br />
ISBN 978-3-642-13216-2. doi: 10.1007/978-3-642-13217-9 2.<br />
[3] Chunhua Liao, Daniel J. Quinlan, Jeremiah Willcock, and Thomas<br />
Panas. Semantic-aware au<strong>to</strong>matic parallelization of modern applications<br />
using high-level abstractions. International Journal of Parallel<br />
Programming, 38(5-6):361–378, 2010.<br />
3 2011/9/2<br />
6<br />
5<br />
4<br />
3<br />
2<br />
1<br />
MFLOPS/Watt