A CIL Tutorial - Department of Computer Science - ETH Zürich

A CIL Tutorial - Department of Computer Science - ETH Zürich A CIL Tutorial - Department of Computer Science - ETH Zürich

29.01.2014 Views

CHAPTER 15. AUTOMATED TEST GENERATION 126 void (autotest foo)(int input a, int input b) { int c, d, e; c = a * b; d = a + b; e = c - d; if (e == 14862436) explode(); if (d == 700) explode(); return; } Figure 15.1: Example let a = a0 in let b = b0 in let c = a * b in let d = a + b in let e = c-d in (e != 14862436) && (d != 700) Figure 15.2: Path Condition

CHAPTER 15. AUTOMATED TEST GENERATION 127 15.1 Background This approach goes by many names including directed automated random testing, concolic testing, whitebox fuzzing, and smart fuzzing. Further, researchers have created a number of tools implementing the approach, and variations on it, for a number of dierent languages. These include DART [3], CUTE [4], CREST [1], and PEX [5]. In particular, CREST also uses CIL as compiler front-end, and Yices [2] for the SMT solver as we will here, and is much more complete than the implementation in this tutorial. Therefore, as a starting point for further investigation of automated test generation based on CIL, CREST is likely to be a more appropriate choice. However, one possible advantage to this simpler tutorial implementation is that it uses OCaml rather than C++ to implement the calls to the SMT solver by using the features of the OCaml runtime that allow OCaml calls to be made from C code. Since these more complete tools exist, for the purposes of this tutorial we'll make some simplifying assumptions. In particular, this implementation will handle only scalar values, and regular and null-terminated arrays of scalar values. That is, struct and union types are not handled. Also, Only functions annotated autotest and instrument will be instrumented for symbolic execution. If a non-instrumented function is called from within an autotest function, only its concrete return value will be used in the path condition. In other words, inputs will not be generated to explore functions not annotated autotest or instrument. Finally, our path-exploration algorithm will give up when the SMT solver is unable to generate a new model for any of the available branches whose sense could be ipped. A more complete implementation would avoid getting stuck in this case. 15.2 Organization A bit more code than in previous tutorials is required to implement these features, so instead of listing and commenting on all of it, we'll take a short tour through a few select functions, types, and modules to get an idea of how the code works, and the high-level ideas behind it. 15.2.1 Instrumentation The code using CIL to instrument a program with calls to the SMT solver is in source le src/tut15.ml. Before carrying out the instrumentation however, we use CIL's Simplify module to break down complex expressions and l-values. A full description of its eects can be found in the CIL documentation. For now, it suces to point out that expressions are simplied to the extent that all binary and unary operations operate only on constants or l-values. This is achieved by the Simplify module by introducing additional temporary variables and assignments. The instrumentation calls notify the automated testing runtime of a number of important events: assignments, conditionals, function calls and returns, and entering and leaving an autotest function. For assignments and conditionals, the calls are passed both the addresses and values of the operands and results. Including the concrete values allows the symbolic execution to underapproximate the concrete execution when the SMT solver lacks a theory for some operation performed by the program. In particular, instead of representing the operation symbolically, the SMT solver can under-approximate the program's behavior by using the concrete values. This under-

CHAPTER 15. AUTOMATED TEST GENERATION 127<br />

15.1 Background<br />

This approach goes by many names including directed automated random testing, concolic testing,<br />

whitebox fuzzing, and smart fuzzing. Further, researchers have created a number <strong>of</strong> tools implementing<br />

the approach, and variations on it, for a number <strong>of</strong> dierent languages. These include DART [3],<br />

CUTE [4], CREST [1], and PEX [5].<br />

In particular, CREST also uses <strong>CIL</strong> as compiler front-end, and Yices [2] for the SMT solver<br />

as we will here, and is much more complete than the implementation in this tutorial. Therefore,<br />

as a starting point for further investigation <strong>of</strong> automated test generation based on <strong>CIL</strong>, CREST is<br />

likely to be a more appropriate choice. However, one possible advantage to this simpler tutorial<br />

implementation is that it uses OCaml rather than C++ to implement the calls to the SMT solver<br />

by using the features <strong>of</strong> the OCaml runtime that allow OCaml calls to be made from C code.<br />

Since these more complete tools exist, for the purposes <strong>of</strong> this tutorial we'll make some simplifying<br />

assumptions. In particular, this implementation will handle only scalar values, and regular and<br />

null-terminated arrays <strong>of</strong> scalar values. That is, struct and union types are not handled. Also,<br />

Only functions annotated autotest and instrument will be instrumented for symbolic execution.<br />

If a non-instrumented function is called from within an autotest function, only its concrete return<br />

value will be used in the path condition. In other words, inputs will not be generated to explore<br />

functions not annotated autotest or instrument. Finally, our path-exploration algorithm will give<br />

up when the SMT solver is unable to generate a new model for any <strong>of</strong> the available branches whose<br />

sense could be ipped. A more complete implementation would avoid getting stuck in this case.<br />

15.2 Organization<br />

A bit more code than in previous tutorials is required to implement these features, so instead <strong>of</strong><br />

listing and commenting on all <strong>of</strong> it, we'll take a short tour through a few select functions, types,<br />

and modules to get an idea <strong>of</strong> how the code works, and the high-level ideas behind it.<br />

15.2.1 Instrumentation<br />

The code using <strong>CIL</strong> to instrument a program with calls to the SMT solver is in source le src/tut15.ml.<br />

Before carrying out the instrumentation however, we use <strong>CIL</strong>'s Simplify module to break down<br />

complex expressions and l-values. A full description <strong>of</strong> its eects can be found in the <strong>CIL</strong> documentation.<br />

For now, it suces to point out that expressions are simplied to the extent that all<br />

binary and unary operations operate only on constants or l-values. This is achieved by the Simplify<br />

module by introducing additional temporary variables and assignments.<br />

The instrumentation calls notify the automated testing runtime <strong>of</strong> a number <strong>of</strong> important events:<br />

assignments, conditionals, function calls and returns, and entering and leaving an autotest function.<br />

For assignments and conditionals, the calls are passed both the addresses and values <strong>of</strong> the<br />

operands and results. Including the concrete values allows the symbolic execution to underapproximate<br />

the concrete execution when the SMT solver lacks a theory for some operation performed<br />

by the program. In particular, instead <strong>of</strong> representing the operation symbolically, the SMT<br />

solver can under-approximate the program's behavior by using the concrete values. This under-

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!