A CIL Tutorial - Department of Computer Science - ETH Zürich
A CIL Tutorial - Department of Computer Science - ETH Zürich A CIL Tutorial - Department of Computer Science - ETH Zürich
Preface The collection of techniques and code examples in this tutorial were developed over several years during my Ph.D. at Berkeley, and during my postdoctoral studies in the Systems Group at ETH Zürich. At ETH I used this tutorial and the accompanying project template to bring students up-to-speed in using CIL for their projects. I found that it was suitable for this purpose not only for students beginning their MS thesis work, but also for advanced undergrads in a course I taught on program analysis and transformation during the Spring semester of 2012. My hope in sharing this tutorial is that it will help students of program analysis and programming language design have a quicker and smoother beginning toward building interesting and useful tools. Good Luck, Zachary Anderson Zürich, Switzerland January 7, 2013 4
Introduction The C Intermediate Language(CIL) [8] is a source-to-source compiler for C. Since CIL is written in OCaml [6], and since it performs a number of simplifying transformations to the C AST, it is very well suited for use in rapid-prototyping new static and dynamic analyses, and for designing and trying out new language extensions. The purpose of this tutorial is to show how CIL can be used to construct a compiler front-end that performs additional analysis, or implements new language extensions. Through a series of examples, we will cover the basics of CIL's AST, its dataow analysis framework, its facilities for instrumenting programs, the ease of extending C's type system, and how to employ a theorem prover in the compilation process. In the later chapters, the tutorial will scratch the surface of some more advanced techniques. These examples will hopefully point in the direction of full-edged implementations of these advanced techniques, and serve as a template and starting point for your future projects. Alternatives Similar results could certainly be achieved by working directly with gcc or Clang/LLVM [1, 5], however the learning curve for these tools is much steeper, and the coding burden much higher. Furthermore, with Clang/LLVM in particular, there is no easy way to add custom statements to C, to make deep changes to its type-system, or to make changes at the level of the AST. Indeed, using Clang/LLVM, language extensions are typically written by interpreting new #pragma directives, e.g. in [9]; and new types are added by extending a complicated class hierarchy designed for speed rather than for ease of understanding or rapid prototyping. As we will see in this tutorial, with CIL, making these sorts of changes is straightforward. However, there is at least one downside with CIL: unlike working directly with gcc, or with Clang/LLVM, there is no support for C++. Language researchers and analysis designers must consider this trade-o when deciding on a compiler framework. How to read this tutorial This tutorial is written in the style of Literate Programming [4]. The source les in the src, ciltutlib, and test directories may be compiled with the OCaml or C compilers as appropriate, and may be processed by ocamlweb [3] (in the case of OCaml code) and pygmentize [2] (in the case of C code) to produce the LATEX code that denes this document. In the template version of this source tree, the comments that generate this document are omitted. 5
- Page 1 and 2: A CIL Tutorial Using CIL for langua
- Page 3 and 4: Contents Preface 4 Introduction 5 0
- Page 5: CONTENTS 3 13 Whole-program Analysi
- Page 9 and 10: References [1] clang: a C language
- Page 11 and 12: CHAPTER 0. OVERVIEW AND ORGANIZATIO
- Page 13 and 14: Chapter 1 The AST The Concrete Synt
- Page 15 and 16: CHAPTER 1. THE AST 13 1.2 Printing
- Page 17 and 18: References [1] Andrew W. Appel. Mod
- Page 19 and 20: CHAPTER 2. VISITING THE AST 17 open
- Page 21 and 22: CHAPTER 2. VISITING THE AST 19 $ ci
- Page 23 and 24: Chapter 3 Dataow Analysis Dataow An
- Page 25 and 26: CHAPTER 3. DATAFLOW ANALYSIS 23 Cod
- Page 27 and 28: CHAPTER 3. DATAFLOW ANALYSIS 25 let
- Page 29 and 30: CHAPTER 3. DATAFLOW ANALYSIS 27 and
- Page 31 and 32: CHAPTER 3. DATAFLOW ANALYSIS 29 let
- Page 33 and 34: CHAPTER 3. DATAFLOW ANALYSIS 31 DoC
- Page 35 and 36: CHAPTER 3. DATAFLOW ANALYSIS 33 tes
- Page 37 and 38: References [1] Aws Albarghouthi, Ra
- Page 39 and 40: CHAPTER 4. INSTRUMENTATION 37 type
- Page 41 and 42: CHAPTER 4. INSTRUMENTATION 39 metho
- Page 43 and 44: CHAPTER 4. INSTRUMENTATION 41 $ cil
- Page 45 and 46: CHAPTER 5. INTERPRETED CONSTRUCTORS
- Page 47 and 48: CHAPTER 5. INTERPRETED CONSTRUCTORS
- Page 49 and 50: Chapter 6 Overriding Functions When
- Page 51 and 52: CHAPTER 6. OVERRIDING FUNCTIONS 49
- Page 53 and 54: References [1] Kumar Avijit, Pratee
- Page 55 and 56: CHAPTER 7. TYPE QUALIFIERS 53 let c
Introduction<br />
The C Intermediate Language(<strong>CIL</strong>) [8] is a source-to-source compiler for C. Since <strong>CIL</strong> is written in<br />
OCaml [6], and since it performs a number <strong>of</strong> simplifying transformations to the C AST, it is very<br />
well suited for use in rapid-prototyping new static and dynamic analyses, and for designing and<br />
trying out new language extensions. The purpose <strong>of</strong> this tutorial is to show how <strong>CIL</strong> can be used<br />
to construct a compiler front-end that performs additional analysis, or implements new language<br />
extensions. Through a series <strong>of</strong> examples, we will cover the basics <strong>of</strong> <strong>CIL</strong>'s AST, its dataow<br />
analysis framework, its facilities for instrumenting programs, the ease <strong>of</strong> extending C's type system,<br />
and how to employ a theorem prover in the compilation process. In the later chapters, the tutorial<br />
will scratch the surface <strong>of</strong> some more advanced techniques. These examples will hopefully point in<br />
the direction <strong>of</strong> full-edged implementations <strong>of</strong> these advanced techniques, and serve as a template<br />
and starting point for your future projects.<br />
Alternatives<br />
Similar results could certainly be achieved by working directly with gcc or Clang/LLVM [1, 5],<br />
however the learning curve for these tools is much steeper, and the coding burden much higher.<br />
Furthermore, with Clang/LLVM in particular, there is no easy way to add custom statements to C,<br />
to make deep changes to its type-system, or to make changes at the level <strong>of</strong> the AST. Indeed, using<br />
Clang/LLVM, language extensions are typically written by interpreting new #pragma directives, e.g.<br />
in [9]; and new types are added by extending a complicated class hierarchy designed for speed rather<br />
than for ease <strong>of</strong> understanding or rapid prototyping. As we will see in this tutorial, with <strong>CIL</strong>, making<br />
these sorts <strong>of</strong> changes is straightforward. However, there is at least one downside with <strong>CIL</strong>: unlike<br />
working directly with gcc, or with Clang/LLVM, there is no support for C++. Language researchers<br />
and analysis designers must consider this trade-o when deciding on a compiler framework.<br />
How to read this tutorial<br />
This tutorial is written in the style <strong>of</strong> Literate Programming [4]. The source les in the src, ciltutlib,<br />
and test directories may be compiled with the OCaml or C compilers as appropriate, and may<br />
be processed by ocamlweb [3] (in the case <strong>of</strong> OCaml code) and pygmentize [2] (in the case <strong>of</strong> C<br />
code) to produce the LATEX code that denes this document. In the template version <strong>of</strong> this source<br />
tree, the comments that generate this document are omitted.<br />
5