A CIL Tutorial - Department of Computer Science - ETH Zürich

A CIL Tutorial - Department of Computer Science - ETH Zürich A CIL Tutorial - Department of Computer Science - ETH Zürich

29.01.2014 Views

Preface The collection of techniques and code examples in this tutorial were developed over several years during my Ph.D. at Berkeley, and during my postdoctoral studies in the Systems Group at ETH Zürich. At ETH I used this tutorial and the accompanying project template to bring students up-to-speed in using CIL for their projects. I found that it was suitable for this purpose not only for students beginning their MS thesis work, but also for advanced undergrads in a course I taught on program analysis and transformation during the Spring semester of 2012. My hope in sharing this tutorial is that it will help students of program analysis and programming language design have a quicker and smoother beginning toward building interesting and useful tools. Good Luck, Zachary Anderson Zürich, Switzerland January 7, 2013 4

Introduction The C Intermediate Language(CIL) [8] is a source-to-source compiler for C. Since CIL is written in OCaml [6], and since it performs a number of simplifying transformations to the C AST, it is very well suited for use in rapid-prototyping new static and dynamic analyses, and for designing and trying out new language extensions. The purpose of this tutorial is to show how CIL can be used to construct a compiler front-end that performs additional analysis, or implements new language extensions. Through a series of examples, we will cover the basics of CIL's AST, its dataow analysis framework, its facilities for instrumenting programs, the ease of extending C's type system, and how to employ a theorem prover in the compilation process. In the later chapters, the tutorial will scratch the surface of some more advanced techniques. These examples will hopefully point in the direction of full-edged implementations of these advanced techniques, and serve as a template and starting point for your future projects. Alternatives Similar results could certainly be achieved by working directly with gcc or Clang/LLVM [1, 5], however the learning curve for these tools is much steeper, and the coding burden much higher. Furthermore, with Clang/LLVM in particular, there is no easy way to add custom statements to C, to make deep changes to its type-system, or to make changes at the level of the AST. Indeed, using Clang/LLVM, language extensions are typically written by interpreting new #pragma directives, e.g. in [9]; and new types are added by extending a complicated class hierarchy designed for speed rather than for ease of understanding or rapid prototyping. As we will see in this tutorial, with CIL, making these sorts of changes is straightforward. However, there is at least one downside with CIL: unlike working directly with gcc, or with Clang/LLVM, there is no support for C++. Language researchers and analysis designers must consider this trade-o when deciding on a compiler framework. How to read this tutorial This tutorial is written in the style of Literate Programming [4]. The source les in the src, ciltutlib, and test directories may be compiled with the OCaml or C compilers as appropriate, and may be processed by ocamlweb [3] (in the case of OCaml code) and pygmentize [2] (in the case of C code) to produce the LATEX code that denes this document. In the template version of this source tree, the comments that generate this document are omitted. 5

Introduction<br />

The C Intermediate Language(<strong>CIL</strong>) [8] is a source-to-source compiler for C. Since <strong>CIL</strong> is written in<br />

OCaml [6], and since it performs a number <strong>of</strong> simplifying transformations to the C AST, it is very<br />

well suited for use in rapid-prototyping new static and dynamic analyses, and for designing and<br />

trying out new language extensions. The purpose <strong>of</strong> this tutorial is to show how <strong>CIL</strong> can be used<br />

to construct a compiler front-end that performs additional analysis, or implements new language<br />

extensions. Through a series <strong>of</strong> examples, we will cover the basics <strong>of</strong> <strong>CIL</strong>'s AST, its dataow<br />

analysis framework, its facilities for instrumenting programs, the ease <strong>of</strong> extending C's type system,<br />

and how to employ a theorem prover in the compilation process. In the later chapters, the tutorial<br />

will scratch the surface <strong>of</strong> some more advanced techniques. These examples will hopefully point in<br />

the direction <strong>of</strong> full-edged implementations <strong>of</strong> these advanced techniques, and serve as a template<br />

and starting point for your future projects.<br />

Alternatives<br />

Similar results could certainly be achieved by working directly with gcc or Clang/LLVM [1, 5],<br />

however the learning curve for these tools is much steeper, and the coding burden much higher.<br />

Furthermore, with Clang/LLVM in particular, there is no easy way to add custom statements to C,<br />

to make deep changes to its type-system, or to make changes at the level <strong>of</strong> the AST. Indeed, using<br />

Clang/LLVM, language extensions are typically written by interpreting new #pragma directives, e.g.<br />

in [9]; and new types are added by extending a complicated class hierarchy designed for speed rather<br />

than for ease <strong>of</strong> understanding or rapid prototyping. As we will see in this tutorial, with <strong>CIL</strong>, making<br />

these sorts <strong>of</strong> changes is straightforward. However, there is at least one downside with <strong>CIL</strong>: unlike<br />

working directly with gcc, or with Clang/LLVM, there is no support for C++. Language researchers<br />

and analysis designers must consider this trade-o when deciding on a compiler framework.<br />

How to read this tutorial<br />

This tutorial is written in the style <strong>of</strong> Literate Programming [4]. The source les in the src, ciltutlib,<br />

and test directories may be compiled with the OCaml or C compilers as appropriate, and may<br />

be processed by ocamlweb [3] (in the case <strong>of</strong> OCaml code) and pygmentize [2] (in the case <strong>of</strong> C<br />

code) to produce the LATEX code that denes this document. In the template version <strong>of</strong> this source<br />

tree, the comments that generate this document are omitted.<br />

5

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!