21.12.2012 Views

The ROSE Source-to-Source Compiler Infrastructure - Cetus

The ROSE Source-to-Source Compiler Infrastructure - Cetus

The ROSE Source-to-Source Compiler Infrastructure - Cetus

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Low level interfaces can give users the maximum freedom <strong>to</strong> manipulate<br />

some details in AST trees. We also create a wide range of<br />

transformation and optimization functions <strong>to</strong> demonstrate the use<br />

of the APIs. Users can further reuse these existence functions <strong>to</strong><br />

build more complex or cus<strong>to</strong>mized transformations and optimizations.<br />

<strong>The</strong> low level AST interface exposes member functions of different<br />

types of objects in AST, including the tree nodes, symbol<br />

tables, attributes, preprocessing information, file location information,<br />

and so on. This level of interface provides users the maximum<br />

freedom <strong>to</strong> change the AST. But using such a low level interface is<br />

a tedious and error-prone process since users have <strong>to</strong> manually create<br />

the edges between nodes and maintain the consistence between<br />

the tree and the corresponding symbol tables. An AST consistency<br />

test function is provided <strong>to</strong> check the transformed AST is complete<br />

and consistent.<br />

<strong>The</strong> high level AST interface contains a set of functions <strong>to</strong> build<br />

AST subtrees and insert them in<strong>to</strong> existing AST. A lot of helper<br />

functions are also provided <strong>to</strong> walk the tree, query nodes, replace,<br />

delete, or even copy subtrees. Using the high level interface, users<br />

only need <strong>to</strong> provide essential source code information as parameters<br />

<strong>to</strong> these functions. All details of edges and symbol tables are<br />

au<strong>to</strong>matically and transparently handled by the interface. <strong>The</strong> high<br />

level interface also support different order of AST constructions.<br />

Users can either build parent nodes first or build children nodes<br />

first. Or statements using some variables can be even build before<br />

the declaration statements are created. <strong>The</strong> high level interface is<br />

able <strong>to</strong> au<strong>to</strong>matically patch up AST and symbol tables once enough<br />

information is available.<br />

We have developed a wide range of transformations and optimizations<br />

in <strong>ROSE</strong>, including partial redundancy elimination, constant<br />

folding, inlining, outlining (separating out a portion of code<br />

as a function), OpenMP 3.0 implementation [2], au<strong>to</strong>matic parallelization<br />

[3] and loop transformations, such as fusion, fission, interchange,<br />

unrolling, and blocking. For example, the <strong>ROSE</strong> outliner<br />

[1] can extract code portions of C, C++ and Fortran in<strong>to</strong> functions.<br />

It supports the classic way of generating one function parameter<br />

for each variable <strong>to</strong> be passed in or out of the generated function.<br />

It also can optionally wrap multiple variables in<strong>to</strong> an array<br />

and use the array as a single function parameter. More importantly,<br />

the <strong>ROSE</strong> outliner differentiates the usage of parameters in<strong>to</strong> useby-value<br />

and use-by-address. So the classic outlining algorithm’s<br />

excessive pointer dereferences (used <strong>to</strong> access written parameters)<br />

can be largely eliminated if the accesses <strong>to</strong> the written parameters<br />

are substituted by their value clones. Other analyses, such as liveness<br />

analysis and scope analysis are also used <strong>to</strong> further improve<br />

the quality of the outlining so the generated function preserves the<br />

original semantics and performance characteristics.<br />

5. Enabling Exascale Research<br />

<strong>ROSE</strong> is being used by several DOE projects <strong>to</strong> address the challenges<br />

of exascale computing. One of the projects, Thrify, is using<br />

<strong>ROSE</strong> <strong>to</strong> explore novel compiler analysis and optimization <strong>to</strong><br />

take advantage of an exascale architecture with many-core chips<br />

specially designed <strong>to</strong> improve performance per watt. In another<br />

project, CODEX, an alternative simula<strong>to</strong>r is used <strong>to</strong> support the<br />

analysis of kernels from DOE applications using other proposed<br />

exascale node architectures and possible future network connection<br />

<strong>to</strong>pologies and hardware. In still another project, <strong>ROSE</strong> has<br />

been used <strong>to</strong> analyze example applications and extract the MPI usage<br />

as an input code <strong>to</strong> support evaluation of message passing <strong>to</strong><br />

support the network communication simulation. This work is part<br />

of a project <strong>to</strong> au<strong>to</strong>mate how co-design can be applied <strong>to</strong> support<br />

the evaluation of DOE applications.<br />

Watt<br />

250<br />

200<br />

150<br />

100<br />

50<br />

Power (Watt) MFLOPS/Watt<br />

0<br />

0<br />

1 2 4 6 8 10 12 14 16<br />

Number of threads<br />

Figure 2. OpenMP transformations using <strong>ROSE</strong> and their evaluation<br />

using a simula<strong>to</strong>r on a numerical kernel (Jacobi Iteration) for both<br />

power and MFLOP/Watt.<br />

Figure 2 shows the evaluation of both power and MFLOP per<br />

Watt using a node simula<strong>to</strong>r and for different numbers of threads.<br />

All transformations were done using <strong>ROSE</strong>. <strong>The</strong> simula<strong>to</strong>r represents<br />

a proposed future node architecture.<br />

6. Results<br />

Numerous <strong>to</strong>ols are available in <strong>ROSE</strong>, such as<br />

1. Au<strong>to</strong> parallelization <strong>to</strong>ols for HPC,<br />

2. OpenMP compiler,<br />

3. Exascale Co-design skele<strong>to</strong>n genera<strong>to</strong>r,<br />

4. <strong>ROSE</strong>-CIRM (Runtime error detection support),<br />

5. Tools for the analysis of MPI,<br />

6. Tools <strong>to</strong> enforce data integrity for Exascale architectures.<br />

7. Binary emulation <strong>to</strong> support general analysis,<br />

8. Compass static analysis <strong>to</strong>ol,<br />

9. Data structure visualization (<strong>to</strong> support debugging) and<br />

10. Program Visualization.<br />

Separate from the <strong>to</strong>ols provided in <strong>ROSE</strong>, and the <strong>to</strong>ols that<br />

external groups have built using <strong>ROSE</strong>, <strong>ROSE</strong> is itself a <strong>to</strong>ol for<br />

building software analysis and optimization <strong>to</strong>ols. Is is used as a<br />

basis for both graduate courses in a wide range of <strong>to</strong>pics from<br />

compiler construction and analysis <strong>to</strong> software assurance. <strong>ROSE</strong> is<br />

released under a BSD license <strong>to</strong> support as low a barrier <strong>to</strong> adoption<br />

as possible for industrial use.<br />

References<br />

[1] Chunhua Liao, Daniel J. Quinlan, Richard Vuduc, and Thomas Panas.<br />

Effective source-<strong>to</strong>-source outlining <strong>to</strong> support whole program empirical<br />

optimization. In <strong>The</strong> 22th International Workshop on Languages<br />

and <strong>Compiler</strong>s for Parallel Computing (LCPC), Newark, Delaware,<br />

USA, 2009.<br />

[2] Chunhua Liao, Daniel J. Quinlan, Thomas Panas, and Bronis R. de<br />

Supinski. A <strong>ROSE</strong>-based OpenMP 3.0 research compiler supporting<br />

multiple runtime libraries. In Mitsuhisa Sa<strong>to</strong>, Toshihiro Hanawa,<br />

Matthias S. Müller, Barbara M. Chapman, and Bronis R. de Supinski,<br />

edi<strong>to</strong>rs, IWOMP, volume 6132 of LNCS, pages 15–28. Springer, 2010.<br />

ISBN 978-3-642-13216-2. doi: 10.1007/978-3-642-13217-9 2.<br />

[3] Chunhua Liao, Daniel J. Quinlan, Jeremiah Willcock, and Thomas<br />

Panas. Semantic-aware au<strong>to</strong>matic parallelization of modern applications<br />

using high-level abstractions. International Journal of Parallel<br />

Programming, 38(5-6):361–378, 2010.<br />

3 2011/9/2<br />

6<br />

5<br />

4<br />

3<br />

2<br />

1<br />

MFLOPS/Watt

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!