C++ for Scientists - Technische Universität Dresden

C++ for Scientists - Technische Universität Dresden C++ for Scientists - Technische Universität Dresden

math.tu.dresden.de
from math.tu.dresden.de More from this publisher
03.12.2012 Views

Technische Universität Dresden Fakültät Mathematik und Naturwissenschaften Institut für wissenschaftliches Rechnen 01062 Dresden http://www.math.tu-dresden.de/~pgottsch/script/cpp for scientists.pdf Peter Gottschling C ++ für Wissenschaftler basierend auf einer gemeinsamen Vorlesung mit Karl Meerbergen mit Hilfe von Andrey Chesnokov, Yvette Vanberghen, Kris Demarsin und Yao Yue und Beiträgen von René Heinzl und Philipp Schwaha Stand 16. Januar 2012

<strong>Technische</strong> <strong>Universität</strong> <strong>Dresden</strong><br />

Fakültät Mathematik und Naturwissenschaften<br />

Institut für wissenschaftliches Rechnen<br />

01062 <strong>Dresden</strong><br />

http://www.math.tu-dresden.de/~pgottsch/script/cpp <strong>for</strong> scientists.pdf<br />

Peter Gottschling<br />

C ++ für Wissenschaftler<br />

basierend auf einer gemeinsamen Vorlesung mit Karl Meerbergen<br />

mit Hilfe von Andrey Chesnokov, Yvette Vanberghen,<br />

Kris Demarsin und Yao Yue<br />

und Beiträgen von René Heinzl und Philipp Schwaha<br />

Stand 16. Januar 2012


Copyright c○ 2010 Copyright (c); Peter Gottschling, René Heinzl, Karl Meerbergen, and<br />

Philipp Schwaha


Contents<br />

I Understanding <strong>C++</strong> 7<br />

Introduction 9<br />

0.1 Programming languages <strong>for</strong> scientific programming . . . . . . . . . . . . . . . . . 9<br />

0.2 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10<br />

1 Good and Bad Scientific Software 11<br />

2 <strong>C++</strong> Basics 19<br />

2.1 Our First Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19<br />

2.2 Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20<br />

2.3 Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25<br />

2.4 Expressions and Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31<br />

2.5 Control statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32<br />

2.6 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39<br />

2.7 Input and output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45<br />

2.8 Structuring Software Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47<br />

2.9 Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49<br />

2.10 Pointers and References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50<br />

2.11 Real-world example: matrix inversion . . . . . . . . . . . . . . . . . . . . . . . . 53<br />

2.12 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62<br />

2.13 Operator Precedence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64<br />

3 Classes 65<br />

3.1 Program <strong>for</strong> universal meaning not <strong>for</strong> technical details . . . . . . . . . . . . . . 65<br />

3.2 Class members . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66<br />

3.3 Constructors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68<br />

3.4 Destructors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74<br />

3.5 Assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75<br />

3.6 Automatically Generated Operators . . . . . . . . . . . . . . . . . . . . . . . . . 76<br />

3.7 Accessing object members . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78<br />

3.8 Other Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88<br />

4 Generic programming 89<br />

4.1 Templates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89<br />

4.2 Generic functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89<br />

3


4 CONTENTS<br />

4.3 Generic classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95<br />

4.4 Concepts and Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97<br />

4.5 Inheritance or Generics? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99<br />

4.6 Template Specialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101<br />

4.7 Non-Type Parameters <strong>for</strong> Templates . . . . . . . . . . . . . . . . . . . . . . . . . 109<br />

4.8 Functors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111<br />

4.9 STL — The Mother of All Generic Libraries . . . . . . . . . . . . . . . . . . . . . 121<br />

4.10 Cursors and Property Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123<br />

4.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128<br />

5 Meta-programming 133<br />

5.1 Let the Compiler Compute . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134<br />

5.2 Providing Type In<strong>for</strong>mation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135<br />

5.3 Expression Templates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150<br />

5.4 Meta-Tuning: Write Your Own Compiler Optimization . . . . . . . . . . . . . . . 156<br />

5.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185<br />

6 Inheritance 187<br />

6.1 Basic Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187<br />

6.2 Dynamic Selection by Sub-typing . . . . . . . . . . . . . . . . . . . . . . . . . . . 187<br />

6.3 Remove Redundancy With Base Classes . . . . . . . . . . . . . . . . . . . . . . . 189<br />

6.4 Casting Up and Down and Elsewhere . . . . . . . . . . . . . . . . . . . . . . . . . 189<br />

6.5 Barton-Nackman Trick . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193<br />

7 Effective Programming: The Polymorphic Way 199<br />

7.1 Imperative Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201<br />

7.2 Generic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203<br />

7.3 Programming with Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206<br />

7.4 Functional Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209<br />

7.5 From Monomorphic to Polymorphic Behavior . . . . . . . . . . . . . . . . . . . . 212<br />

7.6 Best of Both Worlds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221<br />

II Using <strong>C++</strong> 223<br />

8 Finite World of Computers 225<br />

8.1 Mathematical Objects inside the Computer . . . . . . . . . . . . . . . . . . . . . 225<br />

8.2 More Numbers and Basic Structure . . . . . . . . . . . . . . . . . . . . . . . . . . 226<br />

8.3 A Loop and More . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230<br />

8.4 The Other Way Around . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231<br />

9 How to Handle Physics on the Computer 233<br />

9.1 Finite Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233<br />

9.2 Again, Integrators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234<br />

10 Programming tools 235<br />

10.1 GCC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235<br />

10.2 Debugging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236<br />

10.3 Valgrind . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239<br />

10.4 Gnuplot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239


CONTENTS 5<br />

10.5 Unix and Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240<br />

11 <strong>C++</strong> Libraries <strong>for</strong> Scientific Computing 243<br />

11.1 GLAS: Generic Linear Algebra Software . . . . . . . . . . . . . . . . . . . . . . . 243<br />

11.2 Boost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244<br />

11.3 Boost.Bindings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245<br />

11.4 Matrix Template Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249<br />

11.5 Blitz++ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249<br />

11.6 Graph Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249<br />

11.7 Geometric Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249<br />

12 Real-World Programming 253<br />

12.1 Transcending Legacy Applications . . . . . . . . . . . . . . . . . . . . . . . . . . 253<br />

13 Parallelism 259<br />

13.1 Multi-Threading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259<br />

13.2 Message Passing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259<br />

14 Numerical exercises 263<br />

14.1 Computing an eigenfunction of the Poisson equation . . . . . . . . . . . . . . . . 263<br />

14.2 The 2D Poisson equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272<br />

14.3 The solution of a system of differential equations . . . . . . . . . . . . . . . . . . 272<br />

14.4 Google’s Page rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274<br />

14.5 The bisection method <strong>for</strong> finding the zero of a function in an interval . . . . . . . 276<br />

14.6 The Newton-Raphson method <strong>for</strong> finding the minimum of a convex function . . . 278<br />

14.7 Sequential noise reduction of real-time measurements by least squares . . . . . . 281<br />

15 Programmierprojekte 285<br />

15.1 Exponisation von Matrizen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285<br />

15.2 Exponisation von Matrizen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286<br />

15.3 LU-Zerlegung für m × n-Matrizen . . . . . . . . . . . . . . . . . . . . . . . . . . . 286<br />

15.4 Bunch-Kaufman Zerlegung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286<br />

15.5 Konditionszahl (reziprok) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286<br />

15.6 Matrix-Skalierung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287<br />

15.7 QR mit Überschreiben . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287<br />

15.8 Direkter Löser für schwach besetzte Matrizen . . . . . . . . . . . . . . . . . . . . 287<br />

15.9 Anwendung MTL4 auf Typen der Intervallarithmetik . . . . . . . . . . . . . . . . 288<br />

15.10Anwendung MTL4 auf Typen mit höherer Genauigkeit . . . . . . . . . . . . . . . 289<br />

15.11Anwendung MTL4 auf AD-Typen . . . . . . . . . . . . . . . . . . . . . . . . . . 289<br />

16 Acknowledgement 291


6 CONTENTS


Part I<br />

Understanding C ++<br />

7


Introduction<br />

“It would be nice if every kind of numeric software could be written in <strong>C++</strong> without<br />

loss of efficiency, but unless something can be found that achieves this without compromising<br />

the <strong>C++</strong> type system it may be preferable to rely on Fortran, assembler<br />

or architecture-specific extensions.”<br />

— Bjarne Stroustrup.<br />

The purpose of this script is doing you this favor, Bjarne. Amongst others. Conversely, the<br />

reader of this book shall learn the best way to benefit from C ++ features <strong>for</strong> writing scientific<br />

software. It is not our goal to explain all C ++ features in a well-balanced manner. We rather<br />

aim <strong>for</strong> an application-driven illustration of features that are valuable <strong>for</strong> writing<br />

• Well-structured;<br />

• Readable;<br />

• Maintanable;<br />

• Extensible;<br />

• Type-safe;<br />

• Reliable;<br />

• Portable; and last but not least<br />

• Highly per<strong>for</strong>ming<br />

software.<br />

0.1 Programming languages <strong>for</strong> scientific programming<br />

Scientific programming is an old discipline in computer science. The first applications on computers<br />

were indeed computations. In the early decades, ALGOL was a relatively popular programming<br />

language, competing with FORTRAN. FORTRAN 77 became a standard in scientific<br />

programming because of its efficiency and portability. Other computer languages were developed<br />

in computer science but not frequently used in scientific computing : C, Ada, Java, C ++.<br />

They were merely used in universities and labs <strong>for</strong> research purposes.<br />

9


10<br />

C ++ was not a reliable computer language in the nineties : code was not portable, object<br />

code not efficient and had a large size. This made C ++ unpopular in scientific computing. This<br />

changed at the end of the nineties : the compilers produced more efficient code, and the standard<br />

was more and more supported by compilers. Especially the ability to inline small functions and<br />

the introduction of complex numbers in the C99 standard made C ++ more attractive to scientific<br />

programmers.<br />

Together with the development of compilers, numerical libraries are being developed in C ++<br />

that offer great flexibility together with efficiency. This work is still ongoing and more and<br />

more software is being written in C ++. Currently, other languages used <strong>for</strong> numerics are FOR-<br />

TRAN 77 (even new codes!), Fortran 95, and Matlab. More and more becoming popular is<br />

Python. The nice thing about Python is that it is relatively easy to link C ++ functions and<br />

classes into Python scripts. Writing such interfaces is not a subject of this course.<br />

The goal of this course is to introduce students to the exciting world of C ++ programming <strong>for</strong><br />

scientific applications. The course does not offer a deep study of the programming language<br />

itself, but rather focuses on those aspects that make C ++ suitable <strong>for</strong> scientific programming.<br />

Language concepts are introduced and applied to numerical programming, together with the<br />

STL and Boost.<br />

Starting C ++ programmers often adopt a Java programming style: both languages are object<br />

oriented, but there are subtle differences that allow C ++ to produce more compact expressions.<br />

For example, C ++ classes typically do not have getters and setters as this is often the case<br />

in Java classes. This will be discussed in more detail in the course. We use the following<br />

convention, which is also used by Boost, that is one of the good examples of C ++ software.<br />

Classes and variables are denoted by lower case characters. Underscores are used as separator<br />

in symbols. An exception are matrices that are written as single capitals <strong>for</strong> the simularity with<br />

the mathematical notation. Mixed upper and lower case characters (CamelCase) are typically<br />

used <strong>for</strong> concepts. Constants are often (as in C) written in capital.<br />

0.2 Outline<br />

The topics that will be discussed are several aspects of the syntax of C ++, illustrated by small<br />

numerical programs, an introduction to meta programming, expression templates, STL, boost,<br />

MTL4, and GLAS. We will also discuss interoperability with other languages. The first three<br />

chapters discuss basic language aspects, such as functions, types, and classes, inheritance and<br />

generic programming, include examples from STL. The remaining chapters discuss topics that<br />

are of great importance <strong>for</strong> numerical applications: functors, expression templates, and interoperability<br />

with FORTRAN and C.


Good and Bad Scientific Software<br />

Chapter 1<br />

This chapter will give you an idea what we consider good scientific software and what not. If<br />

you have never programmed be<strong>for</strong>e in your life you might wish to skip the entire chapter. This<br />

is o.k. because if you had no contact with the program sources of bad software you can learn<br />

programming with a pure mind.<br />

If you have some software knowledge, there might be still some details you will not understand<br />

right now but this is no reason to worry. If you do not understand it after reading this script<br />

then you can start worrying, or we as authors could. This chapter is only about getting a feeling<br />

what distinguishes good from bad software in science.<br />

As foundation of our discussion — and to not start the book with hello world — we consider an<br />

iterative method to solve system of linear equations Ax = b where A is a symmetric positivedefinite<br />

(SPD) matrix, x and b are vectors, and x is searched. The method is called ‘Conjugate<br />

Gradients’ (CG) and was introduced by Magnus R. Hestenes and Eduard Stiefel [?].<br />

The mathematical details do not matter here but the different styles of implementation. The<br />

algorithm can be written in the following <strong>for</strong>m: 1<br />

Algorithm 1: Conjugate Gradient Method Algorithm.<br />

Input: SPD matrix A, vector b, and left preconditioner L, termination criterion ε.<br />

Output: Vector x such Ax ≈ b.<br />

1 r = b − Ax<br />

2 while |r| ≥ ε do<br />

z = L−1 3<br />

r<br />

4 ρ = 〈r, z〉<br />

5 if First iteration then<br />

6 p = z<br />

7<br />

8<br />

9<br />

10<br />

11<br />

12<br />

13<br />

else<br />

p = z + ρ<br />

ρ ′ p<br />

q = Ap<br />

α = ρ/〈p, q〉<br />

x = x + αp<br />

r = r − αq<br />

ρ ′ = ρ<br />

1 This is not precisely the original notation but a slightly adapted version that introduces some extra variables<br />

to avoid redundant calculations.<br />

11


12 CHAPTER 1. GOOD AND BAD SCIENTIFIC SOFTWARE<br />

Programmers trans<strong>for</strong>m this mathematical notation into a <strong>for</strong>m that a compiler understands,<br />

by using operations from the language. The result could look like Listing 1.1. Do not read it<br />

in detail, just skim it.<br />

#include <br />

#include <br />

double one norm(int size, double ∗vp)<br />

{<br />

int i;<br />

double sum= 0;<br />

<strong>for</strong> (i= 0; i < size; i++)<br />

sum+= fabs(vp[i]);<br />

return sum;<br />

}<br />

double dot(int size, double ∗vp, double ∗wp)<br />

{<br />

int i;<br />

double sum= 0;<br />

<strong>for</strong> (inti= 0; i < size; i++)<br />

sum+= vp[i] ∗ wp[i];<br />

return sum;<br />

}<br />

int cg(int size, int nnz, int∗ aip, int∗ ajp, double∗ avp,<br />

double ∗x, double ∗b, void (∗lpre)(int, double∗, double∗), double eps)<br />

{<br />

int i, j, iter= 0;<br />

double rho, rho 1, alpha;<br />

double ∗p= (double∗) malloc(size ∗ sizeof(double));<br />

double ∗q= (double∗) malloc(size ∗ sizeof(double));<br />

double ∗r= (double∗) malloc(size ∗ sizeof(double));<br />

double ∗z= (double∗) malloc(size ∗ sizeof(double));<br />

// r= b;<br />

<strong>for</strong> (i= 0; i < size; i++)<br />

r[i]= b[i];<br />

// r−= A∗x;<br />

<strong>for</strong> (i= 0; i < nnz; i++)<br />

r[aip[i]]−= avp[i] ∗ b[ajp[i]];<br />

while (one norm(size, r) >= eps) {<br />

// z = solve(L, r);<br />

(∗lpre)(size, z, r); // function pointer call<br />

rho= dot(size, r, z);<br />

if (!iter) {<br />

<strong>for</strong> (i= 0; i < size; i++)<br />

p[i]= z[i];<br />

} else {<br />

<strong>for</strong> (i= 0; i < size; i++)<br />

p[i]= z[i] + rho / rho 1 ∗ p[i];


}<br />

}<br />

// q= A ∗ p;<br />

<strong>for</strong> (i= 0; i < size; i++)<br />

q[i]= 0;<br />

<strong>for</strong> (i= 0; i < nnz; i++)<br />

q[aip[i]]+= avp[i] ∗ p[ajp[i]];<br />

alpha= rho / dot(size, p, q);<br />

// x+= alpa ∗ p; r−= alpha ∗ q;<br />

<strong>for</strong> (i= 0; i < size; i++) {<br />

x[i]+= alpha ∗ p[i];<br />

r[i]−= alpha ∗ q[i];<br />

}<br />

iter++;<br />

}<br />

free(q); free(p); free(r); free(z);<br />

return iter;<br />

void ic 0(int size, double∗ out, double∗ in) { /∗ .. ∗/ }<br />

int main (int argc, char∗ argv[])<br />

{<br />

int nnz, size;<br />

}<br />

// set nnz and size<br />

int ∗aip= (int∗) malloc(nnz ∗ sizeof(double));<br />

int ∗ajp= (int∗) malloc(nnz ∗ sizeof(double));<br />

double ∗avp= (double∗) malloc(nnz ∗ sizeof(double));<br />

double ∗x= (double∗) malloc(size ∗ sizeof(double));<br />

double ∗b= (double∗) malloc(size ∗ sizeof(double));<br />

// set A and b<br />

cg(size, nnz, aip, ajp, avp, x, b, ilu, 1e−9);<br />

return 0 ;<br />

Listing 1.1: Low Abstraction Implementation of CG<br />

As said be<strong>for</strong>e the details do not matter here but only the principal approach. The good thing<br />

about this code is that it is self-contained. But this is about the only advantage. The problem<br />

with this implemenation is its low abstraction level. This creates three major disadvantages:<br />

• Bad readability;<br />

• No flexibility; and<br />

• High error-proneness.<br />

The bad readability manifests in the fact that almost every operation is implemented in one<br />

or multiple loops. For instance, would we have found the matrix vector multiplication q = Ap<br />

13


14 CHAPTER 1. GOOD AND BAD SCIENTIFIC SOFTWARE<br />

without the comments? We would easily catch where the variables representing q, A, and p<br />

are used but to see that this is a matrix vector product will take a closer look and a good<br />

understanding how the matrix is stored.<br />

This leads us to the second problem: the implementation commits to many technical details<br />

and only works in precisely this context. Algorithm 1 only requires that matrix A is symmetric<br />

positive-definite but it does not demand a certain storage scheme. There are many other sparse<br />

matrix <strong>for</strong>mats that we can all use in the CG method but not with this implementation. The<br />

matrix <strong>for</strong>mat is not the only detail the code commits to. What if we want to compute in lower<br />

(float) or higher precision (long double)? Or solve a complex linear system? For every such<br />

new CG application, we need a new implementation. Needless to say that running on parallel<br />

computers or exploring GPGPU (General-Purpose Graphic Processing Units) acceleration<br />

needs reimplementations as well. Much worse, every combination of the above needs a new<br />

implementation.<br />

Some of the readers might think: “It is only one function of 20–30 lines. Rewriting this little<br />

function, how much work can this be. And we do not introduce new matrix <strong>for</strong>mats or computer<br />

architectures every month.” Certainly true but in some sense it is putting the cart be<strong>for</strong>e<br />

the horse. Because of such inflexible and detail-obsessed programming style, many scientific<br />

applications grew into the 100,000s and millions of lines of code. Once an application or library<br />

reached such a monstruous size, it is very arduous modifying features of the software and only<br />

rarely done. The road to success is starting scientific software from a higher level of abstraction<br />

from the beginning, even if it is more work initially.<br />

The last major disadvantage is how error-prone it is. All arguments are given as pointers and<br />

the size of the underlying arrays is given as an extra argument. We as programmer of function<br />

cg can only hope that the caller did everything right because we have no way to verify it. If<br />

the user does not allocate enough memory (or does not allocate at all) the execution will crash<br />

at some more or less random position or even worse, will generate some nonsensical results<br />

because data and software can be randomly owerwritten. Good programmers must avoid such<br />

fragile interfaces because the slightest mistake can have catastrophic consequences and the<br />

program errors are extremely difficult to find. Un<strong>for</strong>tunately, even recently released and widely<br />

used software is written in this manner, either <strong>for</strong> backward-compatibility to C and Fortran or<br />

because it is written in one of these two languages. In fact, the implementation above is C and<br />

not C ++. If this is way you love software, you probably will not like this script.<br />

So much about software we do not like. In Listing 1.2 we show how scientific software could<br />

look like.<br />

// This source is part of MTL4<br />

#include <br />

#include <br />

template < typename LinearOperator, typename HilbertSpaceX, typename HilbertSpaceB,<br />

typename Preconditioner, typename Iteration ><br />

int conjugate gradient(const LinearOperator& A, HilbertSpaceX& x, const HilbertSpaceB& b,<br />

const Preconditioner& L, Iteration& iter)<br />

{<br />

typedef HilbertSpaceX Vector;<br />

typedef typename mtl::Collection::value type Scalar;<br />

Scalar rho(0), rho 1(0), alpha(0);


Vector p(resource(x)), q(resource(x)), r(resource(x)), z(resource(x));<br />

r = b − A∗x;<br />

while (! iter.finished(r)) {<br />

z = solve(L, r);<br />

rho = dot(r, z);<br />

if (iter.first())<br />

p = z;<br />

else<br />

p = z + (rho / rho 1) ∗ p;<br />

q = A ∗ p;<br />

alpha = rho / dot(p, q);<br />

x += alpha ∗ p;<br />

r −= alpha ∗ q;<br />

rho 1 = rho;<br />

++iter;<br />

}<br />

return iter;<br />

}<br />

int main (int argc, char∗ argv[])<br />

{<br />

int size;<br />

}<br />

// set size<br />

mtl::compressed2D A(size, size);<br />

mtl::dense vector x(size), b(size);<br />

// set A and b<br />

// Create preconditioner<br />

itl::pc::ic 0 L(A);<br />

// Object that controls iteration, terminate if residual is below 10ˆ−9 or decrease<br />

// by 6 orders of magnitude, abord after 30 iterations if not converged<br />

itl::basic iteration iter(b, 30, 1.e−6, 1.e−9);<br />

conjugate gradient(A, x, b, L, iter);<br />

return 0 ;<br />

Listing 1.2: High Abstraction Implementation of CG<br />

The first thing you might realize is that the CG implementation is readable without comments.<br />

As a thumb of rule, if other people’s comments look like your program sources then you are<br />

a really good programmer. If you compare the mathematical notation in Algorithm 1 with<br />

Listing 1.2 you will realize that — except <strong>for</strong> the type and variable declarations at the beginnig<br />

— they are identical. Some readers might think that it looks more like Matlab or Mathematica<br />

15


16 CHAPTER 1. GOOD AND BAD SCIENTIFIC SOFTWARE<br />

than C ++. Yes, C ++ can look like this if one puts enough ef<strong>for</strong>t in good software.<br />

Evidently, it is also much easier to write algorithms at this abstraction level than expressing it<br />

with low-level operations.<br />

The Purpose of Scienfic Software<br />

<strong>Scientists</strong> shall do science.<br />

Excellent scientific software is expressed only in mathematical and domainspecific<br />

operations without any technical detail exposed.<br />

At this abstraction level, scientists can focus on models and algorithms, being<br />

much more productive and progress scientific discovery.<br />

Nobody knows how many scientists wasting how much time every year dwelling on small technical<br />

details of bad software like in Listing 1.1. Of course, the technical details have to be realized<br />

in some place but not in a scientific application. This is the worst possible location. Use a<br />

two-level approach: write your applications in terms of expressive mathematical operations and<br />

if they do not exist, implement them separately. These mathematical operations must be carefully<br />

implemented <strong>for</strong> maximal per<strong>for</strong>mance or use other operations with maximal per<strong>for</strong>mance.<br />

Investing time in the per<strong>for</strong>mance of these fundamental operations is highly rentable because<br />

the functions will be reused very often.<br />

Advise<br />

Use the right abstractions!<br />

If they do not exist, implement them.<br />

Speaking of abstractions, the CG implementation in Listing 1.2 does not commit to any technical<br />

detail. In no place, the function is restricted to a numerical type like double. It works as well<br />

<strong>for</strong> float, GNU’s multiprecision, complex, interval arithmetic, quaternions, . . .<br />

The matrix A can have any internal <strong>for</strong>mat, as long as it can be multiplied with a vector<br />

it can be used in the function. In fact, it does not even need to be matrix but can be any<br />

linear operator. For instance, an object that per<strong>for</strong>ms a Fast Fourier Trans<strong>for</strong>mation (FFT)<br />

on a vector can be used as A when the FFT is expressed by a product of A with the vector.<br />

Similarly, the vectors do not need to be represented by finite-dimensional arrays but can be<br />

elements of any vector space that is somehow computer-representable as long as all operations<br />

in the algorithm can be per<strong>for</strong>med.<br />

We are also open to other computer architectures. If the matrix and the vectors are distributed<br />

over the nodes of a parallel supercomputer and according parallel operations are available, the<br />

functions runs in parallel without changing any single line. (GP)GPU acceleration can be also<br />

realized within the data structures and their operations without changing the algorithm. In


general, any existing or new plat<strong>for</strong>m that is supported in the operations of the matrix and<br />

vector types is also supported by our ‘Generic’ conjugate gradient function. As mentioned<br />

be<strong>for</strong>e, we do not even need to change it. If we have a sophisticated scientific application of<br />

several thousand lines (not 100,000s) written with appropriate abstractions, we need to modify<br />

it either.<br />

Starting with the next chapter, we will explain you how to write good scientific software.<br />

17


18 CHAPTER 1. GOOD AND BAD SCIENTIFIC SOFTWARE


C ++ Basics<br />

Chapter 2<br />

In this first chapter we will briefly introduce some basic knowledge about C ++. A useful site<br />

with a reference manual to C ++ commands is http://www.cplusplus.com/.<br />

2.1 Our First Program<br />

As an introduction to the C ++ language, let us look at the following example:<br />

#include <br />

int main ()<br />

{<br />

std::cout ≪ ”Answer to the Ultimate Question of Life, the Universe, and Everything is ”<br />

≪ 6 ∗ 7 ≪ std::endl;<br />

return 0;<br />

}<br />

according to Douglas Adams’ “Hitchhiker’s Guide to the Galaxy.” This short example shows<br />

already many things about C ++:<br />

• The first line includes a file name “iostream.” Whatever is defined in this file will be<br />

defined in our program as well. The file “iostream” contains the standard I/O of C ++.<br />

Input and output is not part of the core language in C ++ but part of the standard libraries.<br />

This means that we cannot program I/O commands without including “iostream” (or<br />

something similar). But it also means that this file must exist in every compiler because<br />

it is part of the standard. Include commands should be at the beginning of the file if<br />

possible.<br />

• the main program is called main and has an integer return value, which is set to 0 by the<br />

return command. The caller of a program (usually the operating system) knows that it<br />

finished successfully when a 0 is returned. A return code other than 0 symbolizes that<br />

something went wrong and often the return code also says something about what went<br />

wrong.<br />

• Braces “{ }” denote a block/group of code (also called a compound statement). Variables<br />

declared within “{ }” groups are only accessible within this block.<br />

19


20 CHAPTER 2. <strong>C++</strong> BASICS<br />

• std::cout and std::endl are defined in “iostream.” The <strong>for</strong>mer is an output stream that prints<br />

text on the screen (unless it is redirected). With std::endl a line is terminated.<br />

• The special operator≪ is used to pass objects to the output to an stream std::cout that is<br />

to print it on that stream.<br />

• The double quotes surround string constants, more precisely string literals. This is the<br />

same as in C. For string manipulation, however, one should use C ++’s string class instead<br />

of C’s cumbersome and error-prone functions.<br />

• The expression 6 ∗ 7 is evaluated and a temporary integer is passed to std::cout. In C ++<br />

everything has a type. Sometimes we as programmers have to declare the type and<br />

sometimes the compiler deduces it <strong>for</strong> us. 6 and 7 are literal constants that have type int<br />

and so has their product.<br />

This was a lot of in<strong>for</strong>mation <strong>for</strong> such a short program. So let us start step by step.<br />

TODO: A little explanation how to compile and run it. For g++ and Visual Studio.<br />

2.2 Variables<br />

In contrast to most scripting languages C ++ is strongly typed, that is every variable has a type<br />

and this type never change. A variable is declared by a statement TYPE varname. 1 Basic types<br />

are int, unsigned int, long, float, double, char, and bool.<br />

int integer1 = 2;<br />

int integer2, integer3;<br />

float pi = 3.14159;<br />

char mycharacter = ’a’;<br />

bool cmp = integer1 < pi;<br />

Each statement has to be terminated by a “;”. In the following section, we show operations<br />

that are often applied to integer and float types. In contrast to other languages like Python,<br />

where ’ and ” is used <strong>for</strong> both characters and strings, C ++ distinguishes between the two of<br />

them. The C ++ compiler considers ’a’ as the character ‘a’ (it has type char) and ”a” is the string<br />

containing ‘a’ (it has type char[1]). If you are used to Python please pay attention to this.<br />

Advise<br />

Define variables right be<strong>for</strong>e using them the first time. This makes your<br />

programs more readable when they grow long. It also allows the compiler to<br />

use your memory more efficiently when you have nested scopes (more details<br />

later). Old C versions required to define all variables at the beginning of a<br />

function and several people stick to this till today. However, in C ++ it leads<br />

generally to higher efficiency and more importantly to higher readability to<br />

define variables as late as possible.<br />

1 TODO: too simple, variable lists and in-place initialization is missing


2.2. VARIABLES 21<br />

2.2.1 Constants<br />

Syntactically, constants are like special variables in C ++ with the additional attribute of immutability.<br />

const int integer1 = 2;<br />

const int integer3; // Error<br />

const float pi = 3.14159;<br />

const char mycharacter = ’a’;<br />

const bool cmp = integer1 < pi;<br />

As they cannot be changed, it is mandatory to set the value in the definition. The second<br />

constant definition violates this rule and the compiler will complain about it.<br />

Constants can be used where ever variables are allowed — as long as they are not modified, of<br />

course. On the other hand, constants like those above are already known during compilation.<br />

This enables many kinds of optimizations and the constants can be even used as arguments of<br />

types (we will come back to this later).<br />

2.2.2 Literals<br />

Literals like “2” or “3.14” have types as well. Simply spoken, integral numbers are treated as<br />

int, long or unsigned long depending on the number of digits. Every number with a dot or an<br />

exponent (e.g. 3e12 ≡ 3 · 10 12 ) is considered a double.<br />

Usually this does not matter much in practice since C ++ has implicit conversation between<br />

built-in numeric types and most programs work well without explicitly specifying the type of<br />

the literals. There are however three major reasons why paying attention to the types of literals:<br />

• Availability;<br />

• Ambiguity and<br />

• Accuracy.<br />

Without going into detail here, the implicit conversation is not used with template functions<br />

(<strong>for</strong> good reasons). The standard library provides a type <strong>for</strong> complex numbers where the type<br />

<strong>for</strong> the real and imaginary part can be parametrized by the user:<br />

std::complex z(1.3, 2.4), z2;<br />

These complex numbers provide of course the common operations. However, when we write:<br />

z2= 2 ∗ z; // error<br />

z2= 2.0 ∗ z; // error<br />

we will get an error message that the multiplication is not available. More specifically, the<br />

compiler will tell us that there is no operator∗() <strong>for</strong> int and std::complex respectively <strong>for</strong><br />

double and std::complex. 2 The library provides a multiplication <strong>for</strong> the type that we use<br />

<strong>for</strong> the real and imaginary part, here float. There are two ways to ascertain that “2” is float:<br />

z2= float(2) ∗ z;<br />

z2= 2.0f ∗ z;<br />

2 It is however possible to implement std::complex in a fashion such that these expressions work [Got11].


22 CHAPTER 2. <strong>C++</strong> BASICS<br />

In the first case, we have an int literal that is converted into float and in the second case, the<br />

literal is float from the beginning. For the sake of clarity, the float literal is preferable.<br />

Later in this book we will introduce function overloading, that is a function with different<br />

implementations <strong>for</strong> different argument types (or argument tuples). The compiler selects the<br />

function overload that fits best. Sometimes the best fit is not clear, <strong>for</strong> instance if function f<br />

accepts an unsigned or a pointer and we call:<br />

f(0);<br />

“0” is considered as int and can be implicitly converted into unsigned or any pointer type. None<br />

of the conversions is prioritized. As be<strong>for</strong>e we can address the issue by explicit conversion and<br />

by a literal of the desired type:<br />

f(unsigned(0));<br />

f(0u);<br />

Again, we prefer the second version because it is more direct (and shorter).<br />

The accuracy issue comes up when work with long double. On the author’s computer, the <strong>for</strong>mat<br />

can handle at least 19 digits. Let us define one third with 20 digits and print out 19 of it:<br />

long double third= 0.3333333333333333333;<br />

cout.precision(19);<br />

cout ≪ ”One third is ” ≪ third ≪ ”.\n”;<br />

The result is:<br />

One third is 0.3333333333333333148.<br />

The program behavior is more satisfying if we append an “l” to the number:<br />

long double third= 0.3333333333333333333l;<br />

yielding the print-out that we hoped <strong>for</strong>:<br />

One third is 0.3333333333333333333.<br />

The following table gives examples of literals and their type:<br />

Literal Type<br />

2 int<br />

2u unsigned<br />

2l long<br />

2ul unsigned long<br />

2.0 double<br />

2.0f float<br />

2.0l long double<br />

For more details, see <strong>for</strong> instance [Str97, § 4.4f,§ C.4]. There you also find a description how to<br />

define literals on an octal or hexadecimal basis.


2.2. VARIABLES 23<br />

2.2.3 Scope of variables<br />

Global definition: Every variable that we intend to use in a program must have been declared<br />

with its type specifier at an earlier point in the code. A variable can be either of global or local<br />

scope. A global variable is a variable that has been declared in the main body of the source<br />

code, outside all functions. After declaration, global variables can be referred from anywhere in<br />

the code, even inside functions. This sounds very handy because it is easily available but when<br />

your software grows it becomes more difficult and painful to keep track of the global variables’<br />

modifications. At some point, every code change bears the potential of triggering an avalanche<br />

of errors. Just do not use global variables. Sooner or later you will regret this. Believe us.<br />

Global constants like<br />

const double pi= 3.14159265358979323846264338327950288419716939;<br />

are fine because they cannot cause side effects.<br />

Local definition: Opposed to it, a local variable is declared within the body of a function<br />

or a block. Its visibility/availability is limited to the block enclosed in curly braces { } where<br />

it is declared. More precisely, the scope of a variable is from its definition to the end of the<br />

enclosing braces. Recalling the example of output streams<br />

int main ()<br />

{<br />

std::ofstream myfile(”example.txt”);<br />

myfile ≪ ”Writing this to a file. ” ≪ std::endl;<br />

return 0;<br />

}<br />

the scope of myfile is the from its definition to the end of function main. If we would write:<br />

int main ()<br />

{<br />

int a= 5;<br />

{<br />

std::ofstream myfile(”example.txt”);<br />

myfile ≪ ”Writing this to a file. ” ≪ std::endl;<br />

}<br />

myfile ≪ ”a is ” ≪ a ≪ std::endl; // error<br />

return 0;<br />

}<br />

then the second output is not valid because myfile is out of scope. The program would not<br />

compile and the compiler would tell you something like “myfile is not defined in this<br />

scope”.<br />

Hiding: If variables with the same name exist in different scopes then only variable is visible<br />

the others are hidden. A variables in an inner scope hides all variables in outer scopes. For<br />

instance: 3<br />

3 TODO: Picture would be nice.


24 CHAPTER 2. <strong>C++</strong> BASICS<br />

int main ()<br />

{<br />

int a= 5; // define #1<br />

{<br />

a= 3; // assign #1, #2 is not defined yet<br />

int a; // define #2<br />

a= 8; // assign #2, #1 is hidden<br />

{<br />

a= 7; // #2<br />

}<br />

} // end of #2’s scope<br />

a= 11; // #1, #2 is now out of scope<br />

return 0;<br />

}<br />

Defining the same variable name twice in the same scope is an error.<br />

The advantage of scopes is that you do not need to worry whether a variable (or something<br />

else) is already defined outside the scope. It is just hidden but does not create a conflict. 4<br />

Un<strong>for</strong>tunately, the hiding makes the homonymous variables in the outer scope inaccessible.<br />

Best thing you can do is to rename the variable in the inner scope (and eventually in the nextouter<br />

scope(s) to access more of those variables). Renaming the outermost variable also solves<br />

the problem of accessibility but tends to be more work because it is probably more often used<br />

due to its longer life time. A better solution to manage nesting and accessibility are namespaces,<br />

see next section.<br />

Scopes also have the advantage to reuse memory, e.g.:<br />

int main ()<br />

{<br />

int x, y;<br />

float z;<br />

cin ≫x;<br />

if (x < 4) {<br />

y= x ∗ x;<br />

// something with y<br />

} else {<br />

z= 2.5 ∗ float(x);<br />

// something with z<br />

}<br />

}<br />

The example uses three variables. However, they are never used at the same time. y is only<br />

used in the first branch and z only in the second one.<br />

Thus, we rewrite the program as follows<br />

int main ()<br />

{<br />

int x;<br />

cin ≫x;<br />

4 As opposed to macros, an obsolete and reckless legacy feature from C that should be avoided at any price<br />

because it undermines all structure and reliability of the language.


2.3. OPERATORS 25<br />

if (x < 4) {<br />

int y= x ∗ x;<br />

// something with y<br />

} else {<br />

float z= 2.5 ∗ float(x);<br />

// something with z<br />

}<br />

}<br />

then y exists only in the first branch and z only exists in the second one. In general, it helps<br />

us saving memory to let variables only live as long as necessary, especially when we have very<br />

large objects. That is define variables as late as possible — ideally directly be<strong>for</strong>e using — then<br />

they are implicitly in the innermost possible scope, e.g. in the branches in the previous example<br />

instead of the main function. The reduced code complexity of having less active variables at<br />

any point in your program also simplifies your life if program does not what it should (in very<br />

rare cases, of course) and you have to debug it.<br />

For all those reasons, it is also preferable defining loop indices directly in the loop:<br />

<strong>for</strong> (int i= 0; i < n; i++) { ... }<br />

If you need the loop index afterwards you must define it outside, e.g.:<br />

cin ≫x;<br />

<strong>for</strong> (int i= 0; abs(x) > 0.001 && i < 100; i++)<br />

x= f(x);<br />

cout ≫”Did ” ≪ i ≪ ” iterations.\n” // error, which i????<br />

The example is some kind of (probably useless) fix point calculation. It stops when |x| ≤<br />

0.001 or 100 iterations were per<strong>for</strong>med (remember the second term is not a termination but a<br />

continuation criterion). When we finished the loop we want to know how many iterations we<br />

per<strong>for</strong>med. But our loop index already died. Let’s try again:<br />

cin ≫x;<br />

int i;<br />

<strong>for</strong> (i= 0; abs(x) > 0.001 && i < 100; i++)<br />

x= f(x);<br />

cout ≫”Did ” ≪ i ≪ ” iterations.\n”<br />

Now it works.<br />

2.3 Operators<br />

C ++ is rich in built-in operators. An operator is a symbol that tells the compiler to per<strong>for</strong>m<br />

specific mathematical or logical manipulations. C ++ has three general classes of operators,<br />

arithmetic, boolean, and bitwise. This section gives a short overview of the different operators<br />

and their meaning.<br />

2.3.1 Arithmetic operators<br />

The following table lists the arithmetic operators allowed in C ++:


26 CHAPTER 2. <strong>C++</strong> BASICS<br />

Operator Action<br />

− subtraction, also unary minus<br />

+ addition<br />

∗ multiplication<br />

/ division<br />

% modulus<br />

−− decrement<br />

++ increment<br />

The modulus operator yields the remainder of the integer division. The ++ operator adds one<br />

to its operand and −− subtracts one. Both can precede or follow the operand. When they<br />

precede the operand, the corresponding operation will be per<strong>for</strong>med be<strong>for</strong>e using the operand’s<br />

value to evaluate the rest of the expression. If the operator follows its operand, C ++ will use<br />

the operand’s value be<strong>for</strong>e incrementing or decrementing it. Consider the following example:<br />

x = 1;<br />

y = ++x;<br />

x = 1;<br />

z = x++;<br />

As a result of executing these four lines of code, y will be set to 2, x will be set to 2 and z will<br />

be set to 1.<br />

The priority and associativity of binary arithmetic operators is the same as we know it from<br />

math: multiplication and division precedes addition and subtraction. Thus, x + y ∗ z is evaluated<br />

as x + (y ∗ z). Operations of the same priority are left-associative, i.e. x / y ∗ z is<br />

equivalent to (x / y) ∗ z. Unary operators have precedence over binary: x ∗ y++ / −z means<br />

(x ∗ (y++)) / (−z). Nevertheless, as long as you are still learning C ++ and not entirely sure<br />

about the precedences, you might want to add redundant parenthesis instead of wasting hours<br />

debugging your program.<br />

With these operators we can write our first numeric program:<br />

#include <br />

int main ()<br />

{<br />

float r1 = 3.5, r2 = 7.3, pi = 3.14159;<br />

float area1 = pi ∗ r1∗r1;<br />

std::cout ≪ ”A circle of radius ” ≪ r1 ≪ ” has area ”<br />

≪ area1 ≪ ”.” ≪ std::endl;<br />

std::cout ≪ ”The average of ” ≪ r1 ≪ ” and ” ≪ r2 ≪ ” is ”<br />

≪ (r1+r2)/2 ≪ ”.” ≪ std::endl;<br />

return 0 ;<br />

}<br />

2.3.2 Boolean operators<br />

Boolean operators are logical and relational operators. Both return boolean values, there<strong>for</strong>e<br />

the name. Operators and their significations are:


2.3. OPERATORS 27<br />

Operator Meaning<br />

> greater than<br />

>= greater than or equal to<br />

< less than<br />

= 1 + 7 is evaluated as if it were written 4 >= (1 + 7).<br />

Advise<br />

Integer values can be treated in C ++ as boolean. For the sake of clarity it is<br />

always better to use bool <strong>for</strong> all logical expression.<br />

This is a legacy of C where bool does not exist. Almost all techniques from C work also in<br />

C ++— as the language name suggests — but using the new features of C ++ allows you to write<br />

programs with better structure. For instance, if you want to store the result of a comparison<br />

do not use an integer variable but a bool.<br />

bool out of bound = x < min || x > max;<br />

2.3.3 Bitwise operators<br />

Bitwise operators allow you to test or change the bits of integers. 5 There are the following<br />

operations:<br />

Operator Action<br />

& AND<br />

| OR<br />

ˆ exclusive OR<br />

∼ one’s complement (NOT)<br />

≫ shift right<br />

≪ shift left<br />

The shift operators bitwise shift the value on their left by the number of bits on their right:<br />

• ≪ shifts left and adds zeros at the right end.<br />

• ≫ shifts right and adds either 0s, if value is an unsigned type, or extends the top bit (to<br />

preserve the sign) if it is a signed type.<br />

5 The bitwise operators work also on bool but it is favorable to use the logical operators from the previous<br />

section. Especially the shift operators are rather silly <strong>for</strong> bool.


28 CHAPTER 2. <strong>C++</strong> BASICS<br />

The bitwise operations can be used to characterize properties in a very compact <strong>for</strong>m as in the<br />

following example:<br />

#include <br />

int main ()<br />

{<br />

int concave = 1, monotone = 2, continuous = 4;<br />

int f is = concave | continuous;<br />

std::cout ≪ ”f is ” ≪ f is ≪ std::endl;<br />

std::cout ≪ ”Is f concave? (0 means no, 1 means yes) ”<br />

≪ (f is & concave) ≪ std::endl;<br />

f is = f is | monotone;<br />

f is = f is ˆ concave;<br />

std::cout ≪ ”f is now ” ≪ f is ≪ std::endl;<br />

return 0 ;<br />

}<br />

Line 5 introduces three properties that can be combined arbitrarily. The numbers are powers<br />

of two so that their binary representations contain a single 1-bit respectively. In line 7 we used<br />

bitwise OR to combine two properties. Bitwise AND allows <strong>for</strong> masking single or multiple bits<br />

as shown in line 11. In line 13 an additional property is set with bitwise OR. Bitwise exclusive<br />

OR (XOR) like in line 14 allows <strong>for</strong> toggling a property. Operating systems and hardware driver<br />

use this style of operations exhaustively. But it needs some practice to get used to it.<br />

Shift operations provide an efficient way to multiply with or divide by powers of 2 as shown in<br />

the following code:<br />

int i = 78;<br />

std::cout ≪ ”i ∗ 8 is ” ≪ (i ≪ 3)<br />

≪ ”, i / 4 is ” ≪ (i ≫2) ≪ std::endl;<br />

Obviously, that needs some familiarization as well.<br />

On the per<strong>for</strong>mance side, processors are today quite fast in multiplying integers so that you<br />

will not see a big per<strong>for</strong>mance boost when replacing your products by left shifts. Division is<br />

still a bit slow and a right shift can make a difference. Even then the price of this source code<br />

obfuscation is only justified if the operation is critical <strong>for</strong> the overall per<strong>for</strong>mance of your entire<br />

application.<br />

2.3.4 Compound assignment<br />

The compound assignment operators apply an arithmetic operation to the left and right-hand<br />

side and store the result in the left hand side.<br />

There operators are +=, −=, ∗=, /=, %=, ≫=, ≪=, &=, ˆ=, and |=.<br />

The statement a+=b is equal to the statement a=a+b.


2.3. OPERATORS 29<br />

2.3.5 Bracket operators<br />

The operator [] is used access elements of an arrays (see § 2.9), and () <strong>for</strong> function calls.<br />

2.3.6 All operators<br />

We haven’t introduced all operators yet. They will be shown in an appropriate context. For<br />

now, we only list the entire operator set with their precedences and associativity. The table is<br />

taken from [?] (by courtesy of Bjarne Stroustrup). For more details about specific operators<br />

see there. The operators on top have the highest priorities. 6<br />

Operator Summary<br />

scope resolution class name :: member<br />

scope resolution namespace name :: member<br />

global :: name<br />

global :: qualified-name<br />

member selection object . member<br />

member selection pointer → member<br />

subscripting expr[ expr ]<br />

subscripting (user-defined) object [ expr ] 7<br />

function call expr ( expr list )<br />

value construction type ( expr list )<br />

post increment lvalue ++<br />

post decrement lvalue −−<br />

type identification typeid ( type )<br />

run-time type identification typeid ( expr )<br />

run-time checked conversion dynamic cast < type > ( expr )<br />

compile-time checked conversion static cast < type > ( expr )<br />

unchecked conversion reinterpret cast < type > ( expr )<br />

cast conversion const cast < type > ( expr )<br />

size of object sizeof expr<br />

size of type sizeof ( type )<br />

pre increment ++ lvalue<br />

pre decrement −− lvalue<br />

complement ∼ expr<br />

not ! expr<br />

unary minus − expr<br />

unary plus + expr<br />

address of & lvalue<br />

dereference ∗ lvalue<br />

create (allocate) new type<br />

create (allocate and initialize) new type( expr list )<br />

create (place) new ( expr list ) type<br />

create (place and initialize) new ( expr list ) type( expr list )<br />

destroy (deallocate) delete pointer<br />

destroy array delete [ ] pointer<br />

6 TODO: If possible references<br />

7 Not in [?].


30 CHAPTER 2. <strong>C++</strong> BASICS<br />

cast (type conversion) ( type ) expr<br />

member selection object.∗ pointer to member<br />

member selection pointer → ∗ pointer to member<br />

multiply expr ∗ expr<br />

divide expr / expr<br />

modulo (remainder) expr % expr<br />

add (plus) expr + expr<br />

subtract (minus) expr − expr<br />

shift left expr ≪ expr<br />

shift right expr ≫ expr<br />

less than expr < expr<br />

less than or equal expr expr<br />

greater than or equal expr >= expr<br />

equal expr == expr<br />

not equal expr != expr<br />

bitwise AND expr & expr<br />

bitwise exclusive OR (XOR) expr ˆ expr<br />

bitwise inclusive OR expr | expr<br />

logical AND expr && expr<br />

logical OR expr || expr<br />

conditional expression expr ? expr: expr<br />

simple assignemt lvalue = expr<br />

mulitply and assignemt lvalue ∗= expr<br />

divide and assignemt lvalue /= expr<br />

modulo and assignemt lvalue %= expr<br />

add and assignemt lvalue += expr<br />

subtract and assignemt lvalue −= expr<br />

shift left and assignemt lvalue ≪= expr<br />

shift right and assignemt lvalue ≫= expr<br />

AND and assignemt lvalue &= expr<br />

inclusive OR and assignemt lvalue |= expr<br />

exclusive OR and assignemt lvalue ˆ= expr<br />

throw exception throw expr<br />

comma (sequencing) expr , expr<br />

To see the operator precedences at one glance, use Table 2.13 on page 64. 8<br />

2.3.7 Overloading<br />

A very powerful aspect of C ++ is that the programmer can define operators <strong>for</strong> new types. This<br />

will be explained in section ??. Operators of built-in types cannot be changed. New operators<br />

cannot be added as in some other languages. If you redefine operators make sure that the<br />

expected priority of the operation corresponds to the operator precedence. For instance, you<br />

might have the idea using the L ATEX notation <strong>for</strong> exponentiation of matrices:<br />

8 TODO: Associativity?


2.4. EXPRESSIONS AND STATEMENTS 31<br />

A= Bˆ2;<br />

A is B squared. So far so good. That the original meaning of ˆ is a bitwise XOR does not<br />

worry us because we do not plan implementing bitwise operations on matrices.<br />

Now we add C:<br />

A= Bˆ2 + C;<br />

Looks nice. But does not work (or does something weird). — Why?<br />

Because + has a higher priority than ˆ. Thus, the compiler understands our expression as:<br />

A= B ˆ (2 + C);<br />

Oops. That looks wrong. 9 The operator gives a concise and intuitive interface but its priority<br />

would cause a lot of confusion. Thus, it is advisable to refrain from this overloading.<br />

2.4 Expressions and Statements<br />

C and C ++ distinguish between expressions and statements. Very casually spoken, one could<br />

just say that every expression becomes a statement if an semicolon is appended. However, we<br />

would like to discuss this topic a bit more.<br />

Let us build this recursively from bottom up. Any variable name (x, y, z, . . . ), constant or<br />

literal is an expression. One or more expressions with an operator is an expression, e.g. x + y or<br />

x ∗ y + z. In several languages, e.g. Pascal, the assignment is a statement. In C and C ++ it is an<br />

expression, e.g. x= y + z. As a consequence it can be used in another assignment: x2= x= y + z.<br />

Assignments are evaluated from right to left. Input and output operations as<br />

std::cout ≪ ”x is ” ≪ x ≪ ”\n”;<br />

are also expressions.<br />

A function call with expressions as arguments is an expression, e.g. abs(x), abs(x ∗ y + z). There<strong>for</strong>,<br />

function calls can be nested: pow(abs(x), y). In languages where a function call is a statement<br />

this would not be possible. As the assignment is an expression, it can be used as argument of a<br />

function: abs(x= y). Or I/O operations as those above. Needless to say that this is quite bad programming<br />

style. An expression surrounded by parenthesis is an expression as well, e.g. x + y.<br />

This allows us to change the order of evaluation, e.g. x ∗ (y + z) computes the addition first<br />

although the multiplication has the higher priority.<br />

A very special operator in C ++ is the ‘comma operator’ that provides a sequential evaluation.<br />

The meaning is simply evaluating first the sub-expression left of the comma and then that right<br />

of it. The value of the whole expression is that of the right sub-expression. The sub-expressions<br />

can contain the comma operator as well so that arbitrarily long sequences can be defined. With<br />

the help of the comma operator, one can evaluate multiple expressions in program locations<br />

where only one expression is allowed. If used as function argument it the comma expression<br />

needs surrounding parentheses; otherwise the comma is interpreted as separation of function<br />

arguments. The comma operator can be overloaded with a user-defined semantics. This can<br />

9 The precise interpretation is A.operator=(operatorˆ(B, operator+(2, C)));


32 CHAPTER 2. <strong>C++</strong> BASICS<br />

complicate the understanding of the program behavior dramatically and has to be used with<br />

utter care. In general it is advisable to use it not too often.<br />

Any of the above expression followed by a semicolon 10 is a statement, e.g.:<br />

x= y + z;<br />

y= f(x + z) ∗ 3.5;<br />

A statement like y + z; is allowed although it is most likely useless. During program execution<br />

the sum of y and z would be computed and then thrown away. Decent compilers would optimize<br />

out this useless computation. However, it is not guaranteed that this statement can be always<br />

omitted. If y or z is an object of a user type then the addition is also user-defined and might<br />

change y or z or something else. This is obviously bad programming style but legitimate in<br />

C ++.<br />

A single semicolon is an empty statement. There<strong>for</strong>, one can put as many semicolons after an<br />

expression as wanted. Some statements do not end with a semicolon, e.g. function definitions.<br />

If a semicolon is appended to such a statement it is not an error but just an extra empty<br />

statement. 11 Any sequence of statements surrounded by curly braces is a statement — called a<br />

compound statement.<br />

The variable and constant declarations we have seen be<strong>for</strong>e are also statements. As initial<br />

value of a variable or constant, one can use any of the expressions mentioned be<strong>for</strong>e (however<br />

involving assignment or comma operator is probably rather confusing). Other statements — to<br />

be discussed later — are function and class definitions, as well as control statements that we<br />

will introduce in the next section.<br />

2.5 Control statements<br />

Control statements allow us to steer the program execution be means of branching and repeating.<br />

2.5.1 If-statement<br />

This is the simplest <strong>for</strong>m of control and its meaning is intuitively clear, <strong>for</strong> instance in:<br />

if (weight > 100.0)<br />

cout ≪ ”This is quite heavy.\n”;<br />

else<br />

cout ≪ ”I can carry this.\n”;<br />

Often, the else branch is not needed and can be omitted. Say we have some value in variable x<br />

and compute something on its magnitude:<br />

if (x < 0.0)<br />

x= −x;<br />

// Now we now that x >= 0.0<br />

10 The usage of the semicolon in Pascal looks similar at the first glance. However, in Pascal the semicolon has<br />

a slightly different purpose which is separating statements. Thus, the semicolon can be omitted when only one<br />

statement exist in a line. Coming from Pascal, it takes some time to get used to this difference.<br />

11 Nonetheless some compilers print a warning in pedantic mode.


2.5. CONTROL STATEMENTS 33<br />

The expression in the parentheses must be logic expression or something convertible to bool.<br />

For instance, one can write:<br />

int i;<br />

// ...<br />

if (i) // bad style<br />

do something();<br />

In the example, do something is called if i is different from 0. Experienced C and C ++ programmers<br />

know that (from heart) but the intentions of the developer are better communicated if<br />

this is stated explicitly:<br />

int i;<br />

// ...<br />

if (i != 0) // much better<br />

do something();<br />

The branches of if consist each of one single statement. To per<strong>for</strong>m multiple operations one can<br />

use braces: 12<br />

int nr then= 0, nr else= 0;<br />

// ...<br />

if (...) {<br />

nr then++;<br />

cout ≪ ”In then−branche\n”;<br />

} else {<br />

nr else++;<br />

cout ≪ ”In else−branche\n”;<br />

}<br />

In the beginning, it is helpful to always write the braces. With more experience, most developers<br />

only write the braces where necessary. At any rate it is highly advisable to intend the branches<br />

<strong>for</strong> better readable with any degree of experience.<br />

An if statement can contain other if-statements:<br />

if (weight > 100.0) {<br />

if (weight > 200.0)<br />

cout ≪ ”This is extremely heavy.\n”;<br />

else<br />

cout ≪ ”This is quite heavy.\n”;<br />

} else {<br />

if (weight < 50.0)<br />

cout ≪ ”A child can carry this.\n”;<br />

else<br />

cout ≪ ”I can carry this.\n”;<br />

}<br />

In the above example, the parentheses could be omitted without changing the behavior but it<br />

is clearer to have them. The example is more readable if we reorganize the nesting:<br />

if (weight < 50.0) {<br />

cout ≪ ”A child can carry this.\n”;<br />

} else if (weight


34 CHAPTER 2. <strong>C++</strong> BASICS<br />

} else if (weight 100.0)<br />

if (weight > 200.0)<br />

cout ≪ ”This is extremely heavy.\n”;<br />

else<br />

cout ≪ ”This is quite heavy.\n”;<br />

It looks like the last line is executed when weight is between 100 and 200 assuming the first if<br />

has no else-branch. But we could also assume the second if comes without else-branch and the<br />

last line is executed when weight is less or equal 100. Fortunately, the C ++ standard specifies<br />

that an else-branch always belongs to the innermost possible if. So, we can count on our first<br />

interpretation. In case that the else-branch should belong to the first if we need braces:<br />

if (weight > 100.0) {<br />

if (weight > 200.0)<br />

cout ≪ ”This is extremely heavy.\n”;<br />

} else<br />

cout ≪ ”This is not so heavy.\n”;<br />

Maybe these examples convinced you that it is more productive to set more braces and save<br />

the time guessing what the branches belong to.<br />

Advise<br />

If you use an editor that understands C ++ (like the IDE from Visual Studio<br />

or emacs in C ++ mode) then automatic indentation is a great help with<br />

structured programming. Whenever a line is not indented as you expected,<br />

something is most likely not nested as you intended.<br />

2.5.2 Conditional Expression<br />

Although this section describes statements, we like to talk about the conditional expression<br />

here because of its proximity to the if-statement. The semantic of<br />

condition ? result <strong>for</strong> true : result <strong>for</strong> false<br />

is that if the condition in first sub-expression evaluates to true then the entire expression is the<br />

second expression otherwise the third one. For instance, we can compute the minimum of two<br />

values with either if-then-else or the conditional expression:


2.5. CONTROL STATEMENTS 35<br />

if (x


36 CHAPTER 2. <strong>C++</strong> BASICS<br />

eps/= 2.0;<br />

} while (eps > 0.0001);<br />

The loop is per<strong>for</strong>med at least one time — even with an extremely small value <strong>for</strong> eps in our<br />

example. The difference between a while-loop and a do-while-loop is irrelevant to most scientific<br />

software. Only loops with very few iterations and with extremely strong impact on the overall<br />

per<strong>for</strong>mance might matter because a do-while-loop per<strong>for</strong>ms one comparison and one jump less.<br />

2.5.4 For Loop<br />

The most common loop in C ++ is the <strong>for</strong>-loop. As simple example we like to add two vectors 15<br />

and print the result afterward:<br />

double v[3], w[]= {2., 4., 6.}, x[]= {6., 5., 4};<br />

<strong>for</strong> (int i= 0; i < 3; i++)<br />

v[i] = w[i] + x[i];<br />

<strong>for</strong> (int i= 0; i < 3; i++)<br />

cout ≪ ”v[” ≪ i ≪ ”] = ” ≪ v[i] ≪ ’\n’;<br />

The loop head consists of three components:<br />

• The initialization;<br />

• A Continuation criterion; and<br />

• A step operation.<br />

The example above is typical <strong>for</strong> a <strong>for</strong>-loop. In the initialization, one typically declares a new<br />

variable and initializes it to 0 because this is the start index of most indexed data structures.<br />

The condition usually tests if the loop index is smaller than a certain size and the last operation<br />

typically increments the loop index.<br />

It is a very popular beginners’ mistake to write conditions like “i


2.5. CONTROL STATEMENTS 37<br />

Here it was simpler to take out term 0 and start with term 1. We also used less-equal to assure<br />

that the term x 10 /10! is considered.<br />

The <strong>for</strong>-loop in C ++ is very flexible. The initialization part can be any expression, a variable<br />

declaration or empty. It is possible to introduce multiple new variables of the same type. This<br />

can be used to avoid repeating the same operation in the condition, e.g.:<br />

<strong>for</strong> (int i= xyz.begin(), end= xyz.end(); i < end; i++) ...<br />

Variables declared in the initialization are only visible within the loop and hide variables of the<br />

same names from outside the loop.<br />

The condition can be any expression that can be converted to a bool. An empty condition is<br />

always true and the loop is repeated infinitely unless from inside the body as we will discuss<br />

in the next section. We said that loop indices are typically incremented in the head’s third<br />

part. In principle, one can modify it within the loop body but programs are much clearer if it<br />

is done in the loop head. On the other hand, there is no limitation that only one variable is<br />

increased by 1. One can modify as many variables as wanted using the comma operator and by<br />

any modification desired such as:<br />

<strong>for</strong> (int i= 0, j= 0, p= 1; ...; i++, j+= 4, p∗= 2) ...<br />

This is of course more complex than having just one loop index but still more readable than<br />

declaring/modifying indices be<strong>for</strong>e the loop or inside the loop body.<br />

In fact, the <strong>for</strong>-loop in C and C ++ is just another notation of a while-loop. Any <strong>for</strong>-loop:<br />

<strong>for</strong> (init; cond; incr) {<br />

st1; st2; ... stn;<br />

}<br />

can be written with a while-loop:<br />

{<br />

}<br />

init;<br />

while (cond) {<br />

st1; st2; ... stn;<br />

incr;<br />

}<br />

Conversely, any while-loop can evidently be written as <strong>for</strong>-loop. We do not know if there is<br />

a design guideline from a software engineering guru when to use while or <strong>for</strong> but <strong>for</strong> is more<br />

concise if there is a local initialization or some incremental operation.<br />

2.5.5 Loop Control<br />

There are two statements to deviate from the regular loop evaluation:<br />

• break and<br />

• continue.<br />

A break terminates the loop entirely and continue ends only the current iteration and continues<br />

the loop with the next iteration, <strong>for</strong> instance:


38 CHAPTER 2. <strong>C++</strong> BASICS<br />

<strong>for</strong> (...; ...; ...) {<br />

...<br />

if (dx == 0.0) continue;<br />

x+= dx;<br />

...<br />

if (r < eps) break;<br />

...<br />

}<br />

In the example above we assumed that the remainder of the iteration is not needed when<br />

dx == 0.0. In some iterative computations it might be clear in the middle of an iteration (here<br />

when r < eps) that work is already done.<br />

Understanding the program behavior becomes more difficult the more breaks and continues<br />

are used. One should always aim <strong>for</strong> moving as much loop control as possible into the loop<br />

head. However, avoiding breaks and continues by excessive if-then-else branches is even less<br />

comprehensible.<br />

Sometimes, one might prefer per<strong>for</strong>ming some surplus operations inside a loop (if it has no<br />

perceivable impact on the overall per<strong>for</strong>mance) and keep the program simpler. Simpler programs<br />

on the other hand have a better chance to get optimized by the compiler. There is certainly<br />

no golden rule but as practical approach one should implement software first <strong>for</strong> maximal<br />

clarity and simplicity (but using efficient algorithms as early as possible). Once the software is<br />

working correctly one can try variations to investigate the impact of implementation details on<br />

per<strong>for</strong>mance.<br />

2.5.6 Switch Statement<br />

A switch is like a special kind of if. It provides a concise notation when different computations<br />

<strong>for</strong> different cases of a given integral value are per<strong>for</strong>med:<br />

switch(op code) {<br />

case 0: z= x + y; break;<br />

case 1: z= x − y; cout ≪ ”compute diff\n”; break;<br />

case 2:<br />

case 3: z= x ∗ y; break;<br />

default: z= x / y;<br />

}<br />

When people see the switch statement <strong>for</strong> the first time, they are usually surprised that one<br />

needs to say at end of each case that the statement is terminated. Otherwise the statements of<br />

the next case are executed as well. This can be used to per<strong>for</strong>m the same operation <strong>for</strong> different<br />

cases, e.g. <strong>for</strong> 2 and 3 in the example above.<br />

The continuation allows us also to implement short loops without the termination test after<br />

each iteration. Say we have vectors with dimension ≤ 5. Then we could implement a vector<br />

addition without a loop:<br />

assert(size(v)


2.6. FUNCTIONS 39<br />

case 3: v[i] = w[i] + x[i]; i++;<br />

case 2: v[i] = w[i] + x[i]; i++;<br />

case 1: v[i] = w[i] + x[i];<br />

case 0: ;<br />

}<br />

This technique is called Duff’s device. Although this is an interesting technique to realize an<br />

iterative computation without a loop, the per<strong>for</strong>mance impact is probably limited in practice.<br />

Such technique should be only considered in program parts with a significant fraction on the<br />

overall run time; otherwise readability of sources is more important.<br />

2.5.7 Goto<br />

DO NOT USE IT. NEVER! EVER!<br />

2.6 Functions<br />

Functions are important building blocks of C ++ programs. The first example we have seen is<br />

the main function in the hello-world program. main must be present in every executable and is<br />

called when the program starts. Other than that there is noting special about main.<br />

The general <strong>for</strong>m of a C ++ function is:<br />

[inline] return type function name (argument list)<br />

{<br />

body of the function<br />

}<br />

For instance, one can be implement a very simple function to square a value:<br />

double square(double x)<br />

{<br />

return x ∗ x;<br />

}<br />

In C and C ++ each function has a return type. A function that does not return a value has the<br />

pseudo-return-type “void”:<br />

void print(double x)<br />

{<br />

std::cout ≪ ”x is ” ≪ x ≪ ’\n’;<br />

}<br />

void is not a real type but moreover a placeholder that enables us to omit returning a value.<br />

We cannot define objects of it:<br />

void nothing; // error


40 CHAPTER 2. <strong>C++</strong> BASICS<br />

2.6.1 Inline Functions<br />

Calling a function requires a fair amount of activities:<br />

• The arguments (or at least their addresses) must be copied on the stack;<br />

• The current program counter must be copied on the stack to continue the execution at<br />

this point when the function is finished;<br />

• Save registers to allow the function using them;<br />

• Jump to the code of the function;<br />

• Execute the function;<br />

• Clean the arguments from the stack;<br />

• Copy the result on the stack;<br />

• Jump back to the calling code;<br />

• Store back registers.<br />

What happens exactly depends on the hardware. The good news is that the function call<br />

overhead is dramatically lower than in the past. Furthermore, the compiler can optimize out<br />

those activities not needed in a specific call.<br />

Nonetheless, <strong>for</strong> small functions, like the square above, the ef<strong>for</strong>t <strong>for</strong> calling the function is still<br />

significantly higher than what the function actually does. C programmers avoid the function-call<br />

overhead by macros. Macros create so many problems in the software development that they<br />

must only be used when there is absolutely no alternative whatsoever. Bjarne Stroustrup<br />

says “Almost every macro demonstrates a flaw in the programming language, in the program,<br />

or in the programmer.” We like to add a flaw “in the compiler optimization”. 16<br />

Fortunately, we have an excellent alternative to macros: inline functions. The programmer just<br />

adds the keyword inline to the function definition:<br />

inline double square(double x)<br />

{<br />

return x ∗ x;<br />

}<br />

and all the overhead of the function call vanishes into thin air.<br />

An excessive use of inline can have a negative effect on per<strong>for</strong>mance. When many large functions<br />

are inlined then the binary executable becomes very large. The consequence is that a lot of<br />

time is spend loading the binary from memory and lots of cache memory is wasted <strong>for</strong> it as<br />

well. This decreases the memory bandwidth and cache available <strong>for</strong> data, causing more slow<br />

down than what is saved on function calls.<br />

16 Advanced: Compilers are today really smart in eliminating unused code. However, we experienced that<br />

arguments of inline functions might be constructed although they are not used. This are usually only few<br />

machine instructions. But when this happens extremely frequently as in an index range check that should<br />

disappear in release mode, it can ruin the overall per<strong>for</strong>mance. We hope that further compiler improvement can<br />

rescue us from this kind of macro usage.


2.6. FUNCTIONS 41<br />

It should be mentioned here that the inline keyword is not mandatory. The compiler can decide<br />

against inlining <strong>for</strong> the reasons given in the previous paragraph. On the other hand, the compiler<br />

is free to inline functions without the inline keyword.<br />

For obvious reasons, the definition of an inline function must be visible in every compile unit<br />

where it is called. In contrast to other functions, it cannot be compiled separately. Conversely,<br />

a non-inline function cannot be visible in multiple compile units because it collides when the<br />

compiled parts are ‘linked’ together. Thus, there are two ways to avoid such collisions: assuring<br />

that the function definition is only present in one compile unit or declaring the function as<br />

inline.<br />

2.6.2 Function Arguments<br />

If we pass an argument to a function it creates by default a copy. For instance, the following<br />

would not work (as expected):<br />

void increment(int x)<br />

{<br />

x++;<br />

}<br />

int main()<br />

{<br />

int i= 4;<br />

increment(i);<br />

cout ≪ ”i is ” ≪ i ≪ ’\n’;<br />

}<br />

The output would be 4. The operation x++ in the second line only increments a local copy but<br />

not the original value. This kind of argument transfer is called ‘call-by-value’ or ‘pass-by-value’.<br />

To modify the value itself we have to ‘pass-by-reference’ the variable:<br />

void increment(int& x)<br />

{<br />

x++;<br />

}<br />

Now the variable itself is increment and the output will be 5 as expected. We will discuss<br />

references more detailed in § 2.10.2.<br />

Temporary variables — like the result of an operation — cannot be passed by reference:<br />

increment(i + 9); // error<br />

We could not compute (i + 9)++ anyway. In order to call such a function with some temporary<br />

value one needs to store it first in a variable and pass this variable to the function.<br />

Larger data structures like vectors and matrices are almost always passed by reference <strong>for</strong><br />

avoiding expensive copy operations:<br />

double two norm(vector& v) { ... }<br />

An operation like a norm should not change its argument. But passing the vector by reference<br />

bears the risk of accidentally overwriting it.


42 CHAPTER 2. <strong>C++</strong> BASICS<br />

To make sure that our vector is not changed (and not copied either), we pass it as constant<br />

reference:<br />

double two norm(const vector& v) { ... }<br />

If we would change v in this function the compiler would emit an error. Both call-by-value and<br />

constant references ascertain that the argument is not altered but by different means:<br />

• Arguments that are passed by value can be changed in the function since the function<br />

works with a copy. 17<br />

• With const references one works on the passed argument directly but all operations that<br />

might change the argument are <strong>for</strong>bidden. In particular, const-referred arguments cannot<br />

appear on the left side of an assignment or passed as non-const references to other functions<br />

(in fact the LHS of an assignment is also a non-const reference).<br />

In contrast to mutable references, constant ones allow <strong>for</strong> passing temporaries:<br />

alpha= two norm(v + w);<br />

This is admittedly not entirely consequent on the language design side, but it makes the life of<br />

programmers much easier.<br />

Values that are quite frequent as argument might be declared as default. Say we implement a<br />

function the computes the n nt root and mostly the square root then we can write:<br />

double root(double x, int degree= 2) { ... }<br />

This function can be called with one or two arguments:<br />

x= root(3.5, 3);<br />

y= root(7.0);<br />

One can declare multiple default arguments but only at the end. In other words, after an<br />

argument with a default value one cannot have one without.<br />

2.6.3 Returning Results<br />

In the examples be<strong>for</strong>e, we only returned double or int. These are the nice ones. Functions that<br />

compute new values of large data structures are more difficult.<br />

Default arguments<br />

Sometimes functions have arguments that are used very infrequently. To address this, you can<br />

give a parameter a default value that is automatically used when no argument corresponding<br />

to that parameter is specified. In this way the caller only needs to specify those arguments that<br />

are meaningful at a particular instance. Consider the following example:<br />

void foo( int a = 5, char ch =’A’ )<br />

{ std::cout ≪ a ≪ ” ” ≪ ch ≪ std::endl ;}<br />

17 This assumes that the argument is properly copied. For user-defined types one can implement its own copy<br />

operation with aliasing effect (on purpose or by accident). Then modifications of the copy also affect the original<br />

object.


2.6. FUNCTIONS 43<br />

foo takes one integer argument with default value 5 and one character argument with a default<br />

value of ‘A’. Now this function can be called by one of the three methods shown here:<br />

foo( 1, ’J’ );<br />

foo(24);<br />

foo();<br />

Which results in the following output:<br />

1 J<br />

24 A<br />

5 A<br />

Void functions<br />

When the result type of a function is void, we do not return a result. For example<br />

void foo( int i ) {<br />

std::cout ≪ ”My value is ” ≪ i ≪ std::endl ;<br />

}<br />

Constant arguments<br />

We can use const objects as arguments in functions to protect them from being changed. For<br />

example :<br />

bool bar( int const& x, int y ) {<br />

y = y+2;<br />

return y ==x ;<br />

}<br />

Since we do not want to modify x, we can add the keyword const. Note that const can be put<br />

be<strong>for</strong>e or behind the type, but it is recommended by the authors of this course to put it behind.<br />

2.6.4 Overloading<br />

In C ++, functions can share the same name as long as their parameter declarations are different.<br />

More precisely, the functions should differ in the number or the type of their parameters.<br />

The compiler can then use the number/type of the arguments to determine which version of<br />

the overloaded function should be used. Note that although overloaded functions may have<br />

different return types, a difference in return type alone is not sufficient to distinguish between<br />

two versions of a function.<br />

Consider the following example:<br />

#include <br />

#include <br />

int divide (int a, int b){<br />

return a / b ;<br />

}


44 CHAPTER 2. <strong>C++</strong> BASICS<br />

float divide (float a, float b){<br />

return std::floor( a / b ) ;<br />

}<br />

int main (){<br />

int x=5,y=2;<br />

float n=5.0,m=2.0;<br />

std::cout ≪ divide (x,y) ≪ std::endl;<br />

std::cout ≪ divide (n,m) ≪ std::endl;<br />

return 0;<br />

} % ≫ ≫ ≫ ≫<br />

In this case we have defined two functions with the same name, divide, but one of them accepts<br />

two parameters of type int and the other one accepts them of type float. In the first call to<br />

divide the two arguments passed are of type int, there<strong>for</strong>e, the function with the first prototype<br />

is called. This function returns the result of dividing one parameter by the other. The second<br />

call passes two arguments of type float, so the function with the second prototype is called.<br />

This one executes a similar division and rounds the result.<br />

2.6.5 Assertions<br />

The function assert is a special kind of function and has the following interface:<br />

void assert (int expression);<br />

If the argument expression evaluates to 0, this causes an assertion failure that terminates the<br />

program. A message is written to the standard error device and abort is called, terminating<br />

the program execution.<br />

The specifics of the message shown depend on the specific implementation in the compiler, but<br />

it shall include: the expression whose assertion failed, the name of the source file, and the line<br />

number where it happened. A usual expression <strong>for</strong>mat is:<br />

Assertion failed: expression, file filename, line linenumber<br />

This allows <strong>for</strong> a programmer to include many assert calls in a source code while debugging<br />

the program. The many assert calls may reduce the per<strong>for</strong>mance of the code and so it would<br />

be desirable to disable asserts <strong>for</strong> high-per<strong>for</strong>mance libraries. Asserts are disabled by including<br />

the following line<br />

#define NDEBUG<br />

at the beginning of his code, be<strong>for</strong>e the inclusion of cassert or by defining the variable in the<br />

compiler, e.g.<br />

g++ -DNDEBUG foo.cpp<br />

Example:<br />

#include <br />

#include <br />

int main ()<br />

{


2.7. INPUT AND OUTPUT 45<br />

std::ifstream datafile( ”file.dat” ) ;<br />

assert( datafile.is open() );<br />

datafile.close();<br />

return 0;<br />

}<br />

In this example, assert is used to abort the program execution if datafile compares equal to 0,<br />

which happens when the opening of the file was unsuccessful.<br />

2.7 Input and output<br />

C ++ uses a convenient abstraction called streams to per<strong>for</strong>m input and output operations in<br />

sequential media such as the screen or the keyboard. A stream is an object where a program<br />

can either insert characters to or extract from. The standard C ++ library includes the header<br />

file iostream, where the standard input and output stream objects are declared.<br />

2.7.1 Standard Output (cout)<br />

By default, the standard output of a program is the screen, and the C ++ stream object defined<br />

to access it, is cout.<br />

cout is used in conjunction with the insertion operator, which is written as ≪ . It may be used<br />

more than once in a single statement. This is especially useful if we want to print a combination<br />

of variables and constants or more than one variable. Consider this example:<br />

std::cout ≪ ”Hello World, my name is ” ≪ name ≪ std::endl ;<br />

std::cout ≪ ”I am ” ≪ age ≪ ” years old.” ≪ std::endl ;<br />

If we assume the name variable to contain the value Jane and the age variable to contain 25<br />

the output of the previous statement would be:<br />

Hello World, my name is Jane.<br />

I am 25 years old.<br />

The endl manipulator produces a newline character. An alternative representation of endl is the<br />

character ’n’.<br />

2.7.2 Standard Input (cin)<br />

The standard input device is usually the keyboard. Handling the standard input in C ++ is done<br />

by applying the overloaded operator of extraction ≫ on the cin stream. The operator must be<br />

followed by the variable that will store the data that is going to be extracted from the stream.<br />

For example:<br />

int age;<br />

std::cin ≫age;


46 CHAPTER 2. <strong>C++</strong> BASICS<br />

The first statement declares a variable of type int called age, and the second one waits <strong>for</strong> an<br />

input from cin (the keyboard) in order to store it in this integer variable. The input from the<br />

keyboard is processed once the RETURN key has been pressed.<br />

You can also use cin to request more than one datum input from the user:<br />

std::cin ≫a ≫b;<br />

is equivalent to:<br />

std::cin ≫a;<br />

std::cin ≫b;<br />

In both cases the user must give two data, one <strong>for</strong> variable a and another one <strong>for</strong> variable b that<br />

may be separated by any valid blank separator: a space, a tab character or a newline.<br />

2.7.3 Input/Output with files<br />

C ++ provides the following classes to per<strong>for</strong>m output and input of characters to/from files:<br />

• std::ofstream: used to write to files<br />

• std::ifstream: used to read from files<br />

• std::fstream: used to both read and write from/to files.<br />

We can use file streams the same way we are already used cin and cout, with the only difference<br />

that we have to associate these streams with physical files. Here is an example:<br />

#include <br />

#include <br />

int main () {<br />

std::ofstream myfile;<br />

myfile.open (”example.txt”);<br />

myfile ≪ ”Writing this to a file. ” ≪ std::endl;<br />

myfile.close();<br />

return 0;<br />

}<br />

This code creates a file called example.txt (or overwrites it if it already exists) and inserts a<br />

sentence into it in a way that is similar to the use of cout. C ++ has the concept of an output<br />

streams that is satisfied by an output file as well as be std::cout. That means that everything<br />

that can be written to std::cout can also be written to a file, and vice versa. If you define yourself<br />

the operator ≪ <strong>for</strong> a new type you do not need to program it <strong>for</strong> different output type but<br />

only once <strong>for</strong> a general output stream, see 18<br />

Alternatively, one can give the file stream object the file name as argument. This opens the file<br />

implicitly. The file is also implicitly closed when myfile at some point, in this case at the end<br />

of the main function. The mechanisms that control such implicit actions will become clear in<br />

§ 2.2.3. The bottom line is that you only in few cases must close your files explicitly. The short<br />

version of the previous listing is<br />

18 TODO: Where? New section needed.


2.8. STRUCTURING SOFTWARE PROJECTS 47<br />

#include <br />

#include <br />

int main () {<br />

std::ofstream myfile(”example.txt”);<br />

myfile ≪ ”Writing this to a file. ” ≪ std::endl;<br />

return 0;<br />

}<br />

2.8 Structuring Software Projects<br />

2.8.1 Namespaces<br />

In the last section we mentioned that equal names in different scopes hides the variables (or<br />

functions, types, . . . ) of the outer scopes while defining the same name in one scope is an error.<br />

Common function names like min, max or abs already exists and if you write a function with<br />

the same name (and same argument types) the compiler will tell you that the name already<br />

exists. But this does not only concern common names; you must be sure that every name you<br />

use is not already used in some other library. This really can be a hustle because you might<br />

add more libraries later and there is new potential <strong>for</strong> conflicts. Then you have to rename some<br />

of your functions and in<strong>for</strong>m everybody who uses your software. Or one of your software users<br />

is including a library that you do not know and has a name conflict. This can grow to a serious<br />

problem and it happens in C all the time.<br />

One possibility to deal with this is using different names like max , my abs, or library name abs.<br />

This in fact what is done in C. Main libraries have short function names, user-libraries longer<br />

names, and OS-related internals typically start with . This decreases the probability of conflicts<br />

but does not eliminate it entirely.<br />

Remark: Particularly annoying are macros. This is an old technique of code reuse by expanding<br />

macro names to their text definition, potentially with arguments. This gives a lot of possibilities<br />

to empower your program but much more <strong>for</strong> ruin it. Macros are resistent against namespaces<br />

because they are reckless text substitution without any notion of types, scopes or any other<br />

language feature. Un<strong>for</strong>tunately, some libraries define macros with common names like major.<br />

We uncompromisingly undefine such macros, e.g. #undef major, without merci <strong>for</strong> people that<br />

might want use those macros. Visual Studio defines — till today!!! — min and max as macros<br />

and we advise you to disable this by compiling with /DNO MIN MAX. Almost all macros can be<br />

replaced by other techniques (constants, templates, inline functions). But if you really do not<br />

find another way of implementing it use LONG AND UGLY NAMES IN CAPITALS like the library<br />

collection Boost does.<br />

2.8.2 Header and implementation<br />

It is usual to split class (Chapter 3) and function definition and implementation into different<br />

files. Classes and functions are typically defined in a header file (.hpp), and implemented in a<br />

cpp file, which is then compiled and added to a library. For example, the header file foo.hpp<br />

could be:<br />

foo.hpp:


48 CHAPTER 2. <strong>C++</strong> BASICS<br />

#ifndef athens foo hpp<br />

#define athens foo hpp<br />

double foo (double a, double b);<br />

#endif<br />

Note the ifndef and define C-preprocessor commands. These commands are called include guards<br />

and prevent the file from being included several times. The use of such guards in header files is<br />

quite common.<br />

The source file in the library would be contained in the file foo.cpp.<br />

#include ”foo.hpp”<br />

double foo (double a, double b)<br />

{ return a+b; }<br />

The main program file is contained in the file bar.cpp:<br />

#include <br />

#include ”foo.hpp”<br />

int main() {<br />

double a = 2.1;<br />

double b = 3.9;<br />

std::cout ≪ foo(a,b) ≪ std::endl ;<br />

}<br />

Include files usually contain the interface of software packages and are stored somewhere on<br />

disk. The compiler is told where to look <strong>for</strong> the include files. The programmer can partially<br />

control this as follows:<br />

• #include ”foo.hpp”: the compiler looks in the directory of the including file and the list of<br />

directories it is given.<br />

• #include : the compiler only looks in the list of directories it is given.<br />

Frequently used include files<br />

The types and functions defined in the following include files are in the namespace std.<br />

• : input and output stream, e.g. std::cin and std::cout<br />

• : file input and output<br />

• : For assertions, see §2.6.5.<br />

• : Headers <strong>for</strong> the C functions from math.h, among others: abs, fabs, pow, acos,<br />

asin, atan, atan2, ceil, floor, cos, cosh, sin, sinh, exp, fmod (floating point mod), modf (split in<br />

integer and fractional part (< 1)), log, log10, sqrt, tan, tanh<br />

And other useful functions such as isnan.<br />

• : String operations


2.9. ARRAYS 49<br />

• : Complex numbers<br />

• , , , , ...: STL, see Section 4.9<br />

Inline keyword<br />

Instead of creating a library as described at the beginning of this section, we can also store the<br />

implementation in the header file. We then have to add the keyword inline <strong>for</strong> two reasons. The<br />

code will not be stored in a library but inlined in the calling functions: this may lead to more<br />

efficient code when the functions are small. If we do not use the inline keyword, we may end<br />

up with multiple defined functions, since the compiler will create the methods in every source<br />

file they are used.<br />

Consider <strong>for</strong> example the following header file sqr.hpp:<br />

#ifndef athens sqr hpp<br />

#define athens sqr hpp<br />

inline double sqr(double a)<br />

{ return a∗a;}<br />

#endif<br />

2.9 Arrays<br />

C based programming languages are not very good at working with arrays. In this section, we<br />

discuss the language concepts <strong>for</strong> arrays. In Section 4.9, we will present more practical software<br />

<strong>for</strong> arrays and other complicated mass data structures.<br />

An array is created as follow<br />

int x[10];<br />

The variable x is a constant size array. It allows <strong>for</strong> fast creation (it is typically stored on the<br />

stack).<br />

Arrays are accessed by square brackets: x[i] is a reference to the ith element. The first element<br />

is x[0], the last one is x[9]. Arrays can be initialized at the definition<br />

float v[]= {1.0, 2.0, 3.0}, w[]= {7.0, 8.0, 9.0};<br />

In this case, the array size is deduced.<br />

Operations on arrays are typically per<strong>for</strong>med in loops, e.g. to compute x = v − 3w as vector<br />

operation is realized by<br />

float x[3];<br />

<strong>for</strong> (int i= 0; i < 3; i++)<br />

x[i]= v[i] − 3.0 ∗ w[i];<br />

One can also define arrays of higher dimension


50 CHAPTER 2. <strong>C++</strong> BASICS<br />

float A[7][9]; // a 7 by 9 matrix<br />

int q[3][2][3] // a 3 by 2 by 3 array<br />

The language does not provide linear algebra operations upon the arrays. There<strong>for</strong>e we will<br />

build our own linear algebra and look <strong>for</strong>ward to future C ++ standards coming with intrinsic<br />

higher math.<br />

Arrays have the following two disadvantages:<br />

• Indices are not checked be<strong>for</strong>e accessing an array and one can find himself outside the<br />

array and the program crashed with segmentation fault/violation. This is not even the<br />

worst case. If your program crashes you see that things go wrong. The false access can<br />

also mess up your own data, the program keeps running and produces entirely wrong<br />

results with whatever consequence you can imagine.<br />

• The size of the array must be known at compile time. 19 For instance, we have an array<br />

stored to a file and need to read it back into memory<br />

ifstream ifs(”some array.dat”);<br />

ifs ≫size;<br />

float v[size]; // error, size not known at compile time<br />

This does not work because we need the size already when the program is compiled.<br />

The first problem can be only solved with new array types and the second one with dynamic<br />

allocation. This leads us to pointers.<br />

2.10 Pointers and References<br />

2.10.1 Pointers<br />

A pointer is a variable that contains a memory address. This address can be that of another<br />

variable or dynamically allocated memory. Let’s start with the latter as we were looking <strong>for</strong><br />

arrays of dynamic size.<br />

int∗ y = new int[10];<br />

This allocates an array of 10 int. The size can now be chosen at run-time. We can also implement<br />

the vector reading example from the previous section<br />

ifstream ifs(”some array.dat”);<br />

int size;<br />

ifs ≫size;<br />

float∗ v= new float[size];<br />

<strong>for</strong> (int i= 0; i < size; i++)<br />

ifs ≫v[i];<br />

Pointers bear the same danger as arrays of risking to access out of range data with program<br />

crashes or data invalidation. It is also the programmer’s responsability to keep the in<strong>for</strong>mation<br />

of the array size.<br />

19 Some compilers support run-time values as array sizes. Since this is not guaranteed to with other compilers<br />

one should avoid this in portable software.


2.10. POINTERS AND REFERENCES 51<br />

Furthermore, the programmer is responsible <strong>for</strong> releasing the memory when not needed anymore.<br />

This is done by<br />

delete[] v;<br />

As we came from arrays, we made the second step be<strong>for</strong>e the first one regarding pointer usage.<br />

The simple use of pointers is allocating one single data item.<br />

int∗ ip = new int;<br />

Releasing such memory is per<strong>for</strong>med by<br />

delete ip;<br />

Note the duality of allocation and release: the single-object allocation requires a single-object<br />

release and the array allocation demands an array release. 20<br />

Pointers can also refer to other variables<br />

int i= 3;<br />

int∗ ip2= &i;<br />

The operator & takes an object and returns its address. The reverse operator is ∗ that takes<br />

an address and returns object.<br />

int j= ∗ip2;<br />

This is called dereferencing. It is clear from the context whether the symbol ∗ represents a<br />

dereference or a multiplication.<br />

A danger of pointers are memory leaks. For instance, our array y became too small and we<br />

want assign a new array<br />

int∗ y = new int[15];<br />

We can now use more space in y. Nice. But what happened with the memory that we allocated<br />

be<strong>for</strong>e? It is still there but we have no access to it anymore. We cannot release it anymore.<br />

This memory is lost <strong>for</strong> the rest of our program execution. Only when the program is finished<br />

the operation system will be able to free it. In the example it is only 40 byte out of how many<br />

Gigabyte you might have. But if this happens with larger data in an iterative process the dead<br />

memory grows and at some point the program crashes when all memory is used.<br />

The warnings above are not intended as fun killers. And we do not discourage the use of<br />

pointers. Many things can be only achieved with pointers: lists, queues, trees, graphs, . . . But<br />

pointers must be used with utter care to avoid all the really serious problems mentioned above.<br />

There are two strategies to minimize pointer-related errors:<br />

Use standard implementations from the standard library or other validated libraries. std::vector<br />

from the standard library provides you all the functionality of dynamic arrays, including<br />

resizing and range check, and the memory is released automatically, see § 4.9. Smart pointers<br />

from Boost provide automatic resource management: dynamically allocated memory<br />

that is not referred by a smart pointer is released automatically, see § 11.2.<br />

20 TODO: Otherwise?


52 CHAPTER 2. <strong>C++</strong> BASICS<br />

Encapsulate your dynamic memory management in classes. Then you have to deal with it<br />

only once per class. 21 If all memory allocated by an object is released when the object<br />

is destroyed then it does not matter how many memory you allocate. If you have 738<br />

objects with dynamic memory then it will be released 738 times. If you have called new<br />

738 times, partly in loops and branches, can you be sure that you have called delete 738<br />

times? We know that there are tools <strong>for</strong> this but these are errors you better prevent than<br />

fix. Even with the encapsulation there is probably something to fix inside the classes but<br />

this is orders of magnitude less work than having pointers spread all over your program.<br />

We have shown two main purposes of pointers:<br />

• Dynamic memory management; and<br />

• Referring other objects.<br />

For the <strong>for</strong>mer there is no alternative to pointers, dynamic memory handling needs pointers,<br />

either directly or using classes that contain pointers. To refer to other objects, there exist<br />

another kind of types called reference (surprise, surprise) that we will introduce in the next<br />

section.<br />

2.10.2 References<br />

The following code introduces a reference:<br />

int i= 5;<br />

int& j= i;<br />

j= 4;<br />

std::cout ≪ ”j = ” ≪ j ≪ ’\n’;<br />

The variable j is referring to i. Changing j will also alter i and vice versa, as in the example. i<br />

and j will always have the same value. One can think of a reference as an alias. Whenever one<br />

defines a reference, one must directly say what it is referring to (other then pointers). It is not<br />

possible to refer to another variable later.<br />

So far, that does not sound extremely useful. References are extremely useful <strong>for</strong> function<br />

arguments (§ 2.6), <strong>for</strong> refering parts of other objects (e.g. the seventh entry of a vector), and<br />

<strong>for</strong> building views ( 22 ).<br />

2.10.3 Comparison between pointers and references<br />

The advantage of pointers over references is the ability of dynamic memory management and<br />

address calculation. On the other hand, references refer to defined locations 23 , they always<br />

must refer to something, they do not leave memory leaks (unless you play really evil tricks),<br />

and they have the same notation in usage as the referred object.<br />

21<br />

It is save to assume that there are many more objects than classes; otherwise there is something wrong with<br />

the program.<br />

22<br />

TODO: reref to a section when it is written<br />

23<br />

References can refer to arbitrary addresses but one must work hard to achieve this. For your own savefy we<br />

will not show you how to make reference to behave as badly as pointers.


2.11. REAL-WORLD EXAMPLE: MATRIX INVERSION 53<br />

Feature Pointers References<br />

Referring defined location - +<br />

Mandatory initialisation - +<br />

Avoidance of memory leaks - +<br />

Object-like notation - +<br />

Memory management + -<br />

Adress calculation + -<br />

Table 2.2: comparison between pointers and references<br />

For short, references are not idiot-proof but much less error-prone than pointers. Pointers<br />

should be only used when dealing with dynamic memory and even then one should do this via<br />

well-tested types or encapsulate the pointer within a class.<br />

2.10.4 Do Not Refer Outdated Data<br />

Variables in functions are only valid within this function, <strong>for</strong> instance:<br />

double& square ref(double d) // DO NOT!<br />

{<br />

double s= d ∗ d;<br />

return s;<br />

}<br />

The variable s is not valid anymore after the function is finished. If you are lucky the memory<br />

where s was stored is not overwritten yet. But this is nothing one can count on. Good compilers<br />

will warn you that you are referring local variable. Sadly enough we have seen examples in web<br />

tutorial that do this!<br />

The same applies correspondingly to pointers:<br />

double∗ square ptr(double d) // DO NOT!<br />

{<br />

double s= d ∗ d;<br />

return &s;<br />

}<br />

This is as wrong as it is <strong>for</strong> references.<br />

There are cases where functions, esp. member functions return references and addresses and<br />

the destruction order of object impedes the invalidation of references, 24 cf. § ??.<br />

2.11 Real-world example: matrix inversion<br />

TODO: I am not sure anymore if this is very good here. I still think we should propagate<br />

abstraction and demonstrate how to develop reusable software but the section feels now a bit<br />

misplaced. At the beginning of the next chapter is not much better. Maybe a good intro<br />

paragraph saves the situation.<br />

24 Un<strong>for</strong>tunately there are ways to circumvent this and an exception to this rule.


54 CHAPTER 2. <strong>C++</strong> BASICS<br />

As a practical exercise, we now go step-by-step through the development process of a function<br />

<strong>for</strong> matrix inversion. This is easier than it seems. 25 For it, we use the Matrix Template Library 4<br />

— see http://www.mtl4.org. It already provides most of the functionality we need. 26<br />

In the program development, we follow some principles of Extreme Programming, especially<br />

writing tests first and implement the functionality afterwards. This has two significant advantages:<br />

• It prevents you as programmer (to some extend) from featurism — the obsession to add<br />

more features instead of finishing one thing after another. If you write down what you<br />

want to achieve you work more directly towards this goal and accomplish it usually much<br />

earlier. When writing the function call you specify the interface of the function you plan<br />

implementing, when testing your results against expected values you say something about<br />

the semantics of your function. Thus, tests are compilable documentation. The tests<br />

might not tell everything about the functions and classes you are going to implement, but<br />

what it says it does very precisely. Documentation in text can be much more detailed and<br />

comprehensible but also much vaguer than tests.<br />

• If you start writing tests after you finally finished the implementation — say on a late<br />

Friday afternoon — You Do Not Want To See It Failing. You will write the test with your<br />

nice data (whatever this means <strong>for</strong> the program in question) and minimize the risk that<br />

it fails. You might decide going home and swear to God that you will test it on Monday.<br />

For those reasons, you will be more honest if you write your tests first. Of course, you can<br />

modify your tests later if you realize that something does not work or you changed the design<br />

of some item or you want test more details. It goes without saying that verifying partial<br />

implementations requires uncommenting parts of your test, temporarily.<br />

Be<strong>for</strong>e we start implementing our inverse function and even the tests we have to choose an<br />

algorithm. We can use determinants of sub-matrices, block algorithms, Gauß-Jordan, or LU<br />

decomposition with or without pivoting. Let’s say we prefer LU factorization with column<br />

pivoting so that we have<br />

LU = P A,<br />

with a unit lower triangular matrix L, an upper triangular matrix U, and a permutation matrix<br />

P . Thus it is<br />

A = P −1 LU<br />

and<br />

A −1 = U −1 L −1 P. (2.1)<br />

We use the LU factorization from MTL4, implement the inversion of the lower and upper<br />

triangular matrix and compose it appropriately.<br />

Now we start with our test by defining an invertible matrix and printing it out.<br />

int main(int argc, char∗ argv[])<br />

{<br />

const unsigned size= 3;<br />

typedef dense2D Matrix;<br />

Matrix A(size, size);<br />

A= 4, 1, 2,<br />

25 At least with the implementations we already have.<br />

26 It actualy provides the inversion function inv already but we want to learn now how to get there.


2.11. REAL-WORLD EXAMPLE: MATRIX INVERSION 55<br />

1, 5, 3,<br />

2, 6, 9;<br />

cout ≪ ”A is:\n” ≪ A;<br />

For later abstraction we define the type Matrix and the constant size. The LU factorization in<br />

MTL4 is per<strong>for</strong>med in place. To not alter our original matrix we copy it into a new one.<br />

Matrix LU(A);<br />

We also define a vector <strong>for</strong> the permutation computed in the factorization.<br />

mtl::dense vector Pv(size);<br />

These are the two arguments <strong>for</strong> the LU factorization<br />

lu(LU, Pv);<br />

For our purpose it is more convenient to represent the permutation as matrix<br />

Matrix P(permutation(Pv));<br />

cout ≪ ”Permutation vector is ” ≪ Pv ≪ ”\nPermutation matrix is\n” ≪ P;<br />

For instance to show A in its permutated <strong>for</strong>m 27<br />

cout ≪ ”Permuted A is \n” ≪ Matrix(P ∗ A);<br />

We now define an identity matrix of appropriate size and extract L and U from our in-place<br />

factorization<br />

Matrix I(matrix::identity(size, size)), L(I + strict lower(LU)), U(upper(LU));<br />

Note that the unit diagonal of L is not stored and needs to be added. It could also be treated<br />

implicitly but we refrain from it <strong>for</strong> the sake of simplicity. We have now finished the preliminaries<br />

and come to our first test. If we had computed the inverse of U, say UI, the product must be<br />

the identity matrix, approximately.<br />

Matrix UI(inverse upper(U));<br />

cout ≪ ”inverse(U) [permuted] is:\n” ≪ UI ≪ ”UI ∗ U is:\n” ≪ Matrix(UI ∗ U);<br />

assert(one norm(Matrix(LI ∗ L − I)) < 0.1);<br />

Testing results of non-trivial numeric calculation <strong>for</strong> equality is quite certain to fail. There<strong>for</strong>e,<br />

we used the norm of the matrix difference as criterion. Likewise, the inversion of L (with a<br />

different function) is tested.<br />

Matrix LI(inverse lower(L));<br />

cout ≪ ”inverse(L) [permuted] is:\n” ≪ LI ≪ ”LI ∗ L is:\n” ≪ Matrix(LI ∗ L);<br />

assert(one norm(Matrix(LI ∗ L − I)) < 0.1);<br />

This enables us to calculate the inverse of A itself and test its correctness<br />

Matrix AI(UI ∗ LI ∗ P);<br />

cout ≪ ”inverse(A) [UI ∗ LI ∗ P] is \n” ≪ AI ≪ ”A ∗ AI is\n” ≪ Matrix(AI ∗ A);<br />

assert(one norm(Matrix(AI ∗ A − I)) < 0.1);<br />

27 If you wonder why we explicitly built a matrix <strong>for</strong> P ∗ A, you have wait until Chapter 5.3 <strong>for</strong> understanding<br />

that some functions return special types that need special treatment. Future versions of MTL4 will minimize the<br />

need of such special treatments.


56 CHAPTER 2. <strong>C++</strong> BASICS<br />

A function computing the inverse must return the same value and also pass the test agains<br />

identity:<br />

Matrix A inverse(inverse(A));<br />

cout ≪ ”inverse(A) is \n” ≪ A inverse ≪ ”A ∗ AI is\n” ≪ Matrix(A inverse ∗ A);<br />

assert(one norm(Matrix(A inverse ∗ A − I)) < 0.1);<br />

After establishing tests <strong>for</strong> all components of our calculation we start with their implementations.<br />

The first function we program is the inversion of an upper triangular matrix. This function<br />

takes a dense matrix as argument and returns another matrix:<br />

dense2D inline inverse upper(dense2D const& A) {<br />

}<br />

Since we do not need another copy of the input matrix we pass it as reference. The argument<br />

shall not be changed so we can pass it as const. The constancy has several advantages:<br />

• We improve the reliability of our program. Arguments passed as const are guaranteed<br />

not to change, if we accidentally modify them the compiler will tell us and abort the<br />

compilation. There is a way to remove the constancy but this should only be used as<br />

last resort, e.g. <strong>for</strong> interfacing obsolete libraries written by others. Everything you write<br />

yourself can be realized without eliminating the constancy of arguments.<br />

• Compilers can optimize better when the objects are guaranteed not to alter.<br />

• In case of references, the function can be called with expressions. Non-const references<br />

require to store the expression into a variable and pass the variable to the function.<br />

Another comment, people might tell you that it is too expensive to return containers as results<br />

and it is more efficient to use references. This is true — in principle. For the moment we accept<br />

this extra cost and pay more attention to clarity and convenience. Later in this book we will<br />

introduce techniques how to minimize the cost of returning containers from functions.<br />

So much <strong>for</strong> the function signature, let us now turn our attention to the function body. The<br />

first thing we do is verifying that our argument is valid. Obviously the matrix must be square:<br />

const unsigned n= num rows(A);<br />

assert(num cols(A) == n); // Matrix must be square<br />

The number of rows is needed several times in this function and is there<strong>for</strong>e stored in a variable,<br />

well constant. Another prerequisite is that the matrix has no zero entries in the diagonal. We<br />

leave this test to the triangular solver.<br />

Speaking of which, we can get our inverse triangular matrix with a triangular solver of a linear<br />

system, which we find in MTL4, more precisely the k-th vector of U −1 is the solution of<br />

Ux = ek<br />

where ek is the k-th unit vector. First we define a temporary variable <strong>for</strong> the result.<br />

dense2D Inv(n, n);<br />

Then we iterate over the columns of Inv:


2.11. REAL-WORLD EXAMPLE: MATRIX INVERSION 57<br />

<strong>for</strong> (unsigned k= 0; k < n; ++k) {<br />

}<br />

In each iteration we need the k-th unit vector.<br />

dense vector e k(n);<br />

<strong>for</strong> (unsigned i= 0; i < n; ++i)<br />

if (i == k)<br />

e k[i]= 1.0;<br />

else<br />

e k[i]= 0.0;<br />

The triangular solver returns a column vector. We could assign the entries of this vector directly<br />

to entries of the target matrix:<br />

<strong>for</strong> (unsigned i= 0; i < n; ++i)<br />

Inv[i][k]= upper trisolve(A, e k)[i];<br />

This is nicely short but we would compute upper trisolve n times! Although we said that per<strong>for</strong>mance<br />

is not our primary goal at this point, the raise of overall complexity from order 3 to 4 is<br />

too much waste of resources. There<strong>for</strong>e, we better store the vector and copy the entries from<br />

there.<br />

dense vector res k(n);<br />

res k= upper trisolve(A, e k);<br />

<strong>for</strong> (unsigned i= 0; i < n; ++i)<br />

Inv[i][k]= res k[i];<br />

Return our temporary matrix finishes the function that we now give in its complete <strong>for</strong>m.<br />

dense2D inverse upper(dense2D const& A)<br />

{<br />

const unsigned n= num rows(A);<br />

assert(num cols(A) == n); // Matrix must be square<br />

}<br />

dense2D Inv(n, n);<br />

<strong>for</strong> (unsigned k= 0; k < n; ++k) {<br />

dense vector e k(n);<br />

<strong>for</strong> (unsigned i= 0; i < n; ++i)<br />

if (i == k)<br />

e k[i]= 1.0;<br />

else<br />

e k[i]= 0.0;<br />

dense vector res k(n);<br />

res k= upper trisolve(A, e k);<br />

<strong>for</strong> (unsigned i= 0; i < n; ++i)<br />

Inv[i][k]= res k[i];<br />

}<br />

return Inv;


58 CHAPTER 2. <strong>C++</strong> BASICS<br />

Now that the function is complete, we first run our test. Evidently, we have to uncomment<br />

part of the test because we only implemented one function so far. But it is worth to know if<br />

this first function already behaves as expected. It does and we could be now happy with it and<br />

turn our attention to the next task, there are still many. But we will not.<br />

Well, at least we can be happy to have a correctly running function. Nevertheless, it is still<br />

worth spending some time to improve it. Such improvements are called refactoring. Experience<br />

from practise has shown that refactoring immediately after implementation is takes much less<br />

time than later modification when bugs are discovered, the software is ported to other plat<strong>for</strong>ms<br />

or extended <strong>for</strong> more usability. Obviously, it is much easier now to simplify and structure our<br />

software immediately when we still know what is going on than in some week/months/years or<br />

when somebody else is refactoring it.<br />

First thing we might dislike is that something so simple as the initialization of a unit vector<br />

takes 5 lines. This is rather verbose. Putting the if statement in one line<br />

is badly structured.<br />

<strong>for</strong> (unsigned i= 0; i < n; ++i)<br />

if (i == k) e k[i]= 1.0; else e k[i]= 0.0;<br />

C ++ and even good ole C have a special operator <strong>for</strong> conditions<br />

<strong>for</strong> (unsigned i= 0; i < n; ++i)<br />

e k[i]= i == k ? 1.0 : 0.0;<br />

The conditional operator ‘?:’ usually needs some time to get used to but it results in a more<br />

concise representation. There are also situations where one cannot use an if but the ?: operator.<br />

Although, we have not changed anything semantically in the program and it seems obvious that<br />

the result will still be the same, it cannot harm to run our test again. You will see, how often<br />

you are sure that your program changes could never possibly change the behavior but still do.<br />

And the sooner you realize the better. And with the test we already wrote it only takes a few<br />

seconds and makes you feel more confident.<br />

If we would like to be really cool we could explore some insider know how. The expression<br />

‘i == k’ returns a boolean and we know that bool can be converted implicitely into int. In this<br />

conversation false results in 0 and true returns 1 according to the standard. This are precisely<br />

the values we want as double:<br />

e k[i]= double(i == k);<br />

In fact, the conversion from int to double is per<strong>for</strong>med implicitly and can be omitted:<br />

e k[i]= i == k;<br />

As cute as this looks, it is some stretch to assign a logical value to a floating point number. It is<br />

well-defined by the implicit conversion chain bool → int → double but it will confuse potential<br />

readers and you might end up explaining them what is happening on a mailing list or you add<br />

a comment to the program. In both cases you end up writing more <strong>for</strong> the explication than you<br />

saved in the program.<br />

Another thought that might occur to us is that it is probably not the last time we need a unit<br />

vector. So, why don’t writing a function <strong>for</strong> it?


2.11. REAL-WORLD EXAMPLE: MATRIX INVERSION 59<br />

dense vector inline unit vector(unsigned k, unsigned n)<br />

{<br />

dense vector v(n, 0.0);<br />

v[k]= 1;<br />

return v;<br />

}<br />

As the function returns the unit vector we can just take it as argument of the triangular solver<br />

res k= upper trisolve(A, unit vector(k, n));<br />

For a dense matrix, MTL4 allows us to access a matrix column as column vector (instead of a<br />

sub-matrix). Then we can assign the result vector directly without a loop.<br />

Inv[irange(0, n)][k]= res k;<br />

As short explanation, the bracket operator is implemented in a manner that integer indices<br />

<strong>for</strong> rows and columns returns the matrix entry while ranges <strong>for</strong> rows and columns returns a<br />

sub-matrix. Likewise, a range of rows and a single column gives you a column of the according<br />

matrix — or part of this column. Vice versa, a row vector can be extracted from a matrix with<br />

an integer as row index and a range <strong>for</strong> the columns.<br />

This is an interesting example how to deal with the limitations as well as possibilities of C ++.<br />

Other languages have ranges as part of their intrinsic notation, e.g. Python has a symbol ‘:’<br />

<strong>for</strong> expressing ranges of indices. C ++ does not have this symbol but we can introduce a new<br />

type — like MTL4’s irange — and define the behavior of operator[] <strong>for</strong> this type. This leads to<br />

an extremely powerful mechanism!<br />

Extending Operator Functionality<br />

Since we cannot introduce new operators into C ++— not now (in 2010), not<br />

in the next standard (C ++0x), maybe in that afterwards — we define new<br />

types and give operators the desired behavior when applied to those types.<br />

This technique allows us providing a very broad functionality with a limited<br />

number of operators.<br />

The operator semantics on user types shall be intuitive and must be consistent with the operator<br />

priority (see example in § 2.3.7).<br />

Back to our algorithm. We store the result of the solver in a vector and then we assign it to a<br />

matrix column. In fact, we can assign the triangular solver’s result directly.<br />

Inv[irange(0, n)][k]= upper trisolve(A, unit vector(k, n));<br />

The range of all indices is predefined as iall:<br />

Inv[iall][k]= upper trisolve(A, unit vector(k, n));<br />

Next, we explore some mathematical back-ground. The inverse of an upper triangular matrix<br />

is also upper triangular. Thus, we only need to compute the upper part of the result and set<br />

the remainder to 0 — or the whole matrix to zero be<strong>for</strong>e computing the upper part. Of course,


60 CHAPTER 2. <strong>C++</strong> BASICS<br />

we need smaller unit vectors now and only sub-matrices of A. This can nicely be expressed with<br />

ranges:<br />

Inv= 0;<br />

<strong>for</strong> (unsigned k= 0; k < n; ++k)<br />

Inv[irange(0, k+1)][k]= upper trisolve(A[irange(0, k+1)][irange(0, k+1)], unit vector(k, k+1));<br />

Admittedly, the irange makes the expression hard to read. Although it looks like a function,<br />

irange is a type and we just created objects on the fly and passed them to passed them to the<br />

operator[]. As we use the same range 3 times, it is shorter to create a variable (or a constant).<br />

<strong>for</strong> (unsigned k= 0; k < n; ++k) {<br />

const irange r(0, k+1);<br />

Inv[r][k]= upper trisolve(A[r][r], unit vector(k, k+1));<br />

}<br />

This does not only make the second line shorter, it is also easier to see that this is all the same<br />

range.<br />

Another observation: after shortening the unit vectors they all have the one in the last entry.<br />

Thus, we only need the size of the vector and the position of the one is implied:<br />

dense vector inline last unit vector(unsigned n)<br />

{<br />

dense vector v(n, 0.0);<br />

v[n−1]= 1;<br />

return v;<br />

}<br />

We choose a different name to reflect the different meaning. Nonetheless, we wonder if we really<br />

want such a function. How is the probability to need this ever again? Charles H. Moore,<br />

the creator of the programming language Forth once said that “The purpose of functions is not<br />

to hash a program into tiny pieces but to create highly reusable entities.” All this said, we<br />

prefer the more general function that is much more likely to be useful later.<br />

After all these modifications, we are now satisfied with the implementation and go to the next<br />

function. We still might change something at a later point in time but having it made clearer<br />

and better structured will make the later modification much easier <strong>for</strong> us or somebody else.<br />

The more experience you gain, the less steps you will need to achieve the implementation that<br />

makes you happy. And it goes without saying that we tested the inverse upper repeatedly while<br />

modifying it.<br />

Now that we know how to invert triangular matrices we can do the same <strong>for</strong> the lower triangular<br />

accordingly. Alternatively we can just transpose the input and output:<br />

dense2D inline inverse lower(dense2D const& A)<br />

{<br />

dense2D T(trans(A));<br />

return dense2D(trans(inverse upper(T)));<br />

}<br />

Ideally this implementation should look like this:<br />

dense2D inline inverse lower(dense2D const& A)<br />

{<br />

return trans(inverse upper(trans(T)));<br />

}


2.11. REAL-WORLD EXAMPLE: MATRIX INVERSION 61<br />

This does not work yet <strong>for</strong> technical reasons but will in the future.<br />

You may argue that the transpostions and passing the matrix and the vector once more takes<br />

more time. More importantly, we know that the lower matrix has a unit diagonal and we did<br />

not explore this property, e.g. <strong>for</strong> avoiding the divisions in the triangular solver. We could<br />

even ignore or omit the diagonal and treat this implicitly in the algorithms. This is all true.<br />

However, we prioritized the simplicity and clarity of the implementation and the reusability<br />

aspect higher than per<strong>for</strong>mance here. 28<br />

We have now all we need to put the matrix inversion together. As above we start we checking<br />

the squareness.<br />

dense2D inline inverse(dense2D const& A)<br />

{<br />

const unsigned n= num rows(A);<br />

assert(num cols(A) == n); // Matrix must be square<br />

Then we per<strong>for</strong>m the LU factorization. For per<strong>for</strong>mance reasons this function does not return<br />

the result but takes its arguments as mutable references and factorizes in place. Thus, we need<br />

a copy of a matrix to pass and a permutation vector of appropriate size.<br />

dense2D PLU(A);<br />

dense vector Pv(n);<br />

lu(PLU, Pv);<br />

The upper triangular factor PU of the permuted A is stored in the upper triangle of PLU. The<br />

lower triangular factor PL is partly stored in the strict lower triangle of PLU while the unit<br />

diagonal is omitted. We there<strong>for</strong>e need to add it be<strong>for</strong>e inversion (or alternatively handle the<br />

unit diagonal implicitly in the inversion).<br />

dense2D PU(upper(PLU)), PL(strict lower(PLU) + matrix::identity(n, n));<br />

The inversion of a square matrix according to Equation (2.1) can then be per<strong>for</strong>med in one<br />

single line: 29<br />

return dense2D(inverse upper(PU) ∗ inverse lower(PL) ∗ permutation(Pv));<br />

During this section you have seen that you have always alternatives to implement the same<br />

behavior, most likely you already made this experience be<strong>for</strong>e. Despite we suggested <strong>for</strong> every<br />

choice we made that it is the most appropriate, there is not always THE single best solution and<br />

even while trading off pro and cons of the alternatives, one might not come to a final conclusion<br />

and just pick one. We also illustrated that the choices depend on the goals, <strong>for</strong> instance the<br />

implementation would look different if per<strong>for</strong>mance were the primary goal.<br />

The section shall show as well that that non-trivial programs are not written in a single sweep<br />

by an ingenious mind — exceptions might prove the rule — but are the result of a gradually<br />

improving development. Experience will make this journey shorter and directer but we will not<br />

write the perfect program at the first glance.<br />

28 People that care about per<strong>for</strong>mance do not use matrix inversion in the first place.<br />

29 The explicit conversion can probably be omitted in later versions of MTL4.


62 CHAPTER 2. <strong>C++</strong> BASICS<br />

2.12 Exercises<br />

2.12.1 Age<br />

Write a program that asks input from the keyboard and prints the result on the screen and a<br />

file. The question is: What is your age?<br />

2.12.2 Exercise on include<br />

We provide you the following files: foo.hpp included by bar1.hpp and bar2.hpp. The main<br />

program is in main.cpp.<br />

Compile and try to link the program. It should not link. Correct errors so that it links.<br />

2.12.3 Arrays and pointers<br />

1. Write the following declarations: pointer to a character, array of 10 integers, pointer to<br />

an array of 10 integers, pointer to an array of character strings, pointer to pointer to a<br />

character, integer constant, pointer to an integer constant, constant pointer to an integer.<br />

Initialize all of the objects.<br />

2. Read a sequence of double’s from an input stream. Let the value 0 define the end of a<br />

sequence. Print the values in the input order. Remove duplicate values. Sort the values<br />

be<strong>for</strong>e printing.<br />

3. Make a small program that creates arrays on the stack (fixed size arrays) and arrays on<br />

the heap (using allocation, i.e. new). Use valgrind to check what happens when you do<br />

not use delete correctly.<br />

2.12.4 Read the header of a Matrix-Market file<br />

The Matrix Market data <strong>for</strong>mat is used to store dense and sparse matrices in ASCII <strong>for</strong>mat.<br />

The header contains some in<strong>for</strong>mation about the type and the size of the matrix. For a sparse<br />

matrix, the data are stored in three columns. The first column is the row number, the second<br />

column the column number, and the third column the numerical value. If the matrix is complex,<br />

a fourth column is added <strong>for</strong> the imaginary part.<br />

An example of a Matrix Market file is:<br />

%%MatrixMarket matrix coordinate real general<br />

%<br />

% ATHENS course matrix<br />

%<br />

2025 2025 100015<br />

1 1 .9273558001498543E-01<br />

1 2 .3545880644900583E-01<br />

...................


2.12. EXERCISES 63<br />

The first line that does not start with % contains the number of rows, the number of columns<br />

and the number of non-zero elements on the sparse matrix.<br />

Use fstream to read the header of a MatrixMarket file and print the number of rows and columns,<br />

and the number of nonzeroes on the screen.<br />

2.12.5 String manipulation programs<br />

There is a type string in the standard library. This type contains a large number of string<br />

operations, such as string concatenation, string comparison, etc. Note the include of the header<br />

file string.<br />

#include <br />

#include <br />

int main()<br />

{<br />

std::string s1 = ”Hello”;<br />

std::string s2 = ”World”;<br />

std::string s3 = s1 + ”, ” + s2 ;<br />

std::cout ≪ s3 ≪ std::endl ;<br />

return 0;<br />

}<br />

In this example we have concatenated the strings s1 and s2 together with a string constants.<br />

Per<strong>for</strong>m the following exercises:<br />

1. Write a function itoa (int i, std::string& b) that constructs a string representation of i in b<br />

and returns b.<br />

2. Write a simple encryption program. It should read the input from cin and write the<br />

encrypted symbols in cout. Use the following simple encryption scheme: the code <strong>for</strong> a<br />

symbol c is c key[i] , where key is a string given as a parameter to a function. The symbols<br />

from key are used in a cyclic way. (After the repeated encryption with a same key key you<br />

should get the source string.)


64 CHAPTER 2. <strong>C++</strong> BASICS<br />

2.13 Operator Precedence<br />

The following table gives all operators on one page <strong>for</strong> quickly seeing their priorities, <strong>for</strong> meaning<br />

see Table 2.3.6. Semicolons are only separators.<br />

Operator Precedence<br />

class name :: member; namespace name :: member; :: name; :: qualified-name<br />

object . member; pointer → member; expr[ expr ]<br />

object [ expr ]; expr ( expr list ); type ( expr list ); lvalue ++; lvalue −−<br />

typeid ( type ); typeid ( expr ); dynamic cast < type > ( expr )<br />

static cast < type > ( expr ); reinterpret cast < type > ( expr )<br />

const cast < type > ( expr )<br />

sizeof expr; sizeof ( type ); ++ lvalue; −− lvalue; ∼ expr; ! expr; − expr<br />

+expr; & lvalue; ∗ lvalue; new type; new type( expr list )<br />

new ( expr list ) type; new ( expr list ) type( expr list )<br />

delete pointer; delete [ ] pointer; ( type ) expr<br />

object.∗ pointer to member; pointer → ∗ pointer to member<br />

expr ∗ expr; expr / expr; expr % expr<br />

expr + expr; expr − expr<br />

expr ≪ expr; expr ≫ expr<br />

expr < expr; expr expr; expr >= expr<br />

expr == expr; expr != expr<br />

expr & expr<br />

expr ˆ expr<br />

expr | expr<br />

expr && expr<br />

expr || expr<br />

expr ? expr: expr<br />

lvalue = expr; lvalue ∗= expr; lvalue /= expr; lvalue %= expr; lvalue += expr<br />

lvalue −= expr; lvalue ≪= expr; lvalue ≫= expr; lvalue &= expr<br />

lvalue |= expr; lvalue ˆ= expr<br />

throw expr<br />

expr , expr


Classes<br />

Chapter 3<br />

“Computer science is no more about computers than astronomy is about telescopes.”<br />

— Edsger W. Dijkstra.<br />

“Accordingly, computer science is more than programming language details.”<br />

Good programming is more then drilling on small language details and more then cleverly<br />

manipulating specific bits on the latest and greatest computer hardware. Focusing primarily<br />

on technical details can lead to clever codes that per<strong>for</strong>m a certain task in a certain context<br />

extremely efficiently. If one is good at this one might even create the fastest solution <strong>for</strong> this<br />

task and gain the admiration of the geeks.<br />

3.1 Program <strong>for</strong> universal meaning not <strong>for</strong> technical details<br />

Writing leading-edge scientific software with such an attitude is very painful and likely to fail.<br />

The most important tasks in scientific programming are:<br />

• Identifying the mathematical abstractions that are important in the domain; and<br />

• Representing this abstractions comprehensively and efficiently in software.<br />

Common abstractions that appear in almost every scientific application are vector spaces and<br />

linear operators. A linear operator projects from one vector space to another one.<br />

First we should decide how to represent this abstraction in a program. Be v an element of a<br />

vector space and L a linear operator. Then C ++ allows us to represent the application of L on<br />

v as<br />

or<br />

L(v)<br />

L ∗ v<br />

Which one is better suited is not so easy to say. What is easy to say is that both are better<br />

then<br />

65


66 CHAPTER 3. CLASSES<br />

apply symm blk2x2 rowmajor dnsvec multhr athlon(L.data addr, L.nrows, L.ncols,<br />

L.ldim, L.blksch, v.data addr, v.size);<br />

Developing software in this fashion is far from being fun. It wastes so much energy of the<br />

programmer. Getting such calls right is of course much more work than the <strong>for</strong>mer notations.<br />

If one of the arguments is stored in a different <strong>for</strong>mat, the function call must be meticulously<br />

adapted. Remember the person who implements the linear projection wanted to do science,<br />

actually.<br />

The cardinal error of scientific software providing such interfaces — there is even worse than<br />

our example — is to commit to too many technical details in the user interface. The reason lies<br />

partly in the usage of simplistic programming languages as C and Fortran 77 or in the ef<strong>for</strong>t to<br />

interoperate with software in these languages.<br />

Advise<br />

If you ever get <strong>for</strong>ced to write software that interoperates with C or Fortran,<br />

write your software first with a concise and intuitive interface in C ++ <strong>for</strong><br />

yourself and other C ++ programmer and add the C and Fortran interface on<br />

top of it.<br />

The elegant way of writing scientific software is to use and to provide the best abstraction. A<br />

good implementation reduces the user interface to the essential behavior and omits all surplus<br />

commitments to technical details. Applications with a concise and intuitive interface can be as<br />

efficient as their ugly and detail-obsessed counterparts.<br />

In our example, this is achieved by providing a class <strong>for</strong> every specific linear operator and implement<br />

the projection type-dependently. 1 This way, we can apply the projection without given<br />

all details and the user application is short and nice. This chapter will show the foundations of<br />

how providing new abstraction in scientific software and the following chapters will elaborate<br />

this.<br />

3.2 Class members<br />

Object types are called classes in C ++, defined by the class keyword. A class defines a new data<br />

type, which can be used to create objects. A class is a collection of:<br />

• data;<br />

• functions which are also referred to as member functions or methods;<br />

• types<br />

Furthermore class members can be public or private and classes can inherit from each other.<br />

Let us now give an example to illustrate the class concept. To have something tangible <strong>for</strong><br />

scientists, we refrain from foo and bar examples but implement gradually a class complex (al-<br />

1 Specializations <strong>for</strong> specific plat<strong>for</strong>ms can also be handled with the type system.


3.2. CLASS MEMBERS 67<br />

though this already exist). This class must contain variables to store the real and the imaginary<br />

part:<br />

class complex<br />

{<br />

double r, i;<br />

};<br />

Variables within a class are called ‘member variables’.<br />

3.2.1 Access attributes<br />

All items — variables, constants, functions, and types — of a class have access attributes. C ++<br />

provides the following three attributes:<br />

• public: Accessible from everywhere;<br />

• private: Accessible only within the class; and<br />

• protected: Accessible only within the class and in derived classes.<br />

The access attributes give the class designer good control how the class users can utilize the<br />

class. Defining more public members gives more freedom in usage but less control and vice<br />

versa more private members establishes a stricter user interface. Protected members are less<br />

restrictive then private ones and more restrictive then public ones. Since inheritence is not a<br />

major topic in this book, they are not very important in this context. All class members are<br />

by default ‘private’.<br />

3.2.2 Member functions<br />

It is common practice in object-oriented software to declare member variables as private and<br />

access them with functions. We do this here in a Java style:<br />

class complex<br />

{<br />

public:<br />

double get r() { return r; }<br />

void set r(double newr) { r = newr; }<br />

double get i() { return i; }<br />

void set i(double newi) { i = newi; }<br />

private:<br />

double r, i;<br />

};<br />

Functions in a class are called ‘member functions’. Member functions are also private by default,<br />

i.e. they can only be called by functions within the class. This is evidently not particularly<br />

useful <strong>for</strong> our getters and setters.<br />

There<strong>for</strong>e we declared them ‘public’. Public member functions and variables can be accessed<br />

outside the class. So, we can write c.get r() but not c.r. The class above can be used in the<br />

following way:


68 CHAPTER 3. CLASSES<br />

int main()<br />

{<br />

complex c1, c2;<br />

// set c1<br />

c1.set r(3.0);<br />

c1.set i(2.0);<br />

// copy c1 to c2<br />

c2.set r(c1.get r());<br />

c2.set i(c1.get i());<br />

return 0;<br />

}<br />

In line 3 we created two objects of type complex. Then we set one of the objects and copied it<br />

to the other one. This works but it is a bit clumsy, isn’t it?<br />

C ++ provides another keyword <strong>for</strong> defining classes: struct. The only difference 2 is that members<br />

are by default public, there<strong>for</strong>e the example above is equivalent to:<br />

struct complex<br />

{<br />

double get r() { return r; }<br />

void set r(double newr) { r = newr; }<br />

double get i() { return i; }<br />

void set i(double newi) { i = newi; }<br />

private:<br />

double r, i;<br />

};<br />

Our member variables can only be accessed via functions. This gives the class designer the<br />

maximal control over the behavior. The setter could only accept values in a certain range. We<br />

could count how often the setter and getter is called <strong>for</strong> each complex number or <strong>for</strong> all complex<br />

numbers in the execution. The functions could have additional print-outs <strong>for</strong> debugging. 3 We<br />

could even allow the reading only at certain times of the day or writing only if the program runs<br />

on a computer with a certain IP. We will most likely not do the latter, at least not <strong>for</strong> complex<br />

numbers, but we could. If the variables are public and accessed directly, such modifications<br />

would not be possible. Nevertheless, handling the real and imaginary part of a complex number<br />

is cumbersome and we will discuss alternatives.<br />

Most C ++ programmer would not implement it this way. What would a C ++ programmer do<br />

first then? Writing constructors.<br />

3.3 Constructors<br />

What are constructors? Constructors initialize objects of classes and create a working environment<br />

<strong>for</strong> member functions. Sometimes such an environment includes resources like files,<br />

memory or locks that have to be freed after use. We come back to this later.<br />

To start with let us define a constructor <strong>for</strong> complex:<br />

2 There is really no other difference. One can define operators and virtual functions or derived classes in the<br />

same manner as with class. Per<strong>for</strong>mance of class and struct is also absolutely identical.<br />

3 A debugger is usually a better alternative to putting print-outs into programs.


3.3. CONSTRUCTORS 69<br />

class complex<br />

{<br />

public:<br />

complex(double rnew, double inew)<br />

{<br />

r= rnew; i= inew;<br />

}<br />

// ...<br />

};<br />

Thus, a constructor is a member function with the same name as the class itself. It can have<br />

an arbitrary number of arguments. In our case, two arguments are most suitable because we<br />

want to set two member variables. This constructor allows us to set c1’s values directly in the<br />

definition:<br />

complex c1(2.0, 3.0);<br />

There is a special syntax <strong>for</strong> setting member variables in constructors<br />

class complex<br />

{<br />

public:<br />

complex(double rnew, double inew) : r(rnew), i(inew) {}<br />

// ...<br />

};<br />

This not only shorter but has also another advantage. It calls the constructors of the variables in<br />

class’s constructor. For plain old data types (POD) this does not make a significant difference.<br />

The situation is another one if the members are themselves classes.<br />

Imagine you have a class that solves linear systems with the same matrix and you store the<br />

matrix in your class<br />

class solver<br />

{<br />

public:<br />

solver(int nrows, int ncols) // : A() #1 → error<br />

{<br />

A(nrows, ncols); // this is not a constructor here #2 → error<br />

}<br />

// ...<br />

private:<br />

matrix type A;<br />

};<br />

Suppose our matrix class has a constructor setting the dimensions. This constructor cannot<br />

be called in the function body of the constructor (#2). The call in #2 is interpreted as<br />

A.operator()(nrows, ncols), see § 4.8.<br />

All member variables of the class are constructed be<strong>for</strong>e the class constructor reaches the opening<br />

{. Those members — like A — that do not appear in the list after the colon are built by a constructor<br />

without arguments, called the default constructor. Correspondingly, classes that have<br />

such a constructor are called default-constructible. Our matrix class is not default-constructible<br />

and the compiler will tell us something like “Operator matrix type::matrix type() not<br />

found”. Thus, we need


70 CHAPTER 3. CLASSES<br />

class solver<br />

{<br />

public:<br />

solver(int nrows, int ncols) : A(nrows, ncols) {}<br />

// ...<br />

private:<br />

matrix type A;<br />

};<br />

Often the matrix (or whatever other object) is already constructed and we do not like to waste<br />

the memory <strong>for</strong> a copy. In this case we will use a reference to the object. A reference must<br />

be set in the constructor because this is the only place to declare what it is referring to. The<br />

solver shall not modify the matrix, so we write:<br />

class solver<br />

{<br />

public:<br />

solver(const matrix type& A) : A(A) {}<br />

// ...<br />

private:<br />

const matrix type& A;<br />

};<br />

The code also shows that we can give the constructor arguments the same names as the member<br />

variables. After the colon, which A is which? The rule is that names outside the parenthesis<br />

refer to members and inside the parenthesis the constructor arguments are hiding the member<br />

variables. Some people are confused by this rule and use different names. To what refers A<br />

inside {}? To the constructor argument. Only names that does not exist as argument names<br />

are interpreted as member variables. In fact, this is a pure scope resolution: the scope of the<br />

function — in this case the constructor — is inside the scope of the class and thus the argument<br />

names hide the class member names.<br />

Let us return to our complex example. So far, we have a constructor allowing us to set the real<br />

and the imaginary part. Often only the real part is set and the imaginary is defaulted to 0.<br />

class complex<br />

{<br />

public:<br />

complex(double r, double i) : r(r), i(i) {}<br />

complex(double r) : r(rnew), i(0) {}<br />

// ...<br />

};<br />

We can also say that the number is 0 + 0i if no value is given, i.e. if the complex number is<br />

default-constructed:<br />

complex() : r(0), i(0) {}


3.3. CONSTRUCTORS 71<br />

Advise<br />

Define a default constructor <strong>for</strong> where it is possible although it might not<br />

seem necessary when you implement the class.<br />

For the complex class, we might think that we do not need a default constructor because we<br />

can delay its declaration until we know its value. The absence of a default constructor creates<br />

(at least) two problems:<br />

• We might need the variable outside the scope in which the values are computed. For<br />

instance, if the value depends on some condition and we would declare the (complex)<br />

variable in the two branches of if, the variable would not exist after the if.<br />

• We build containers of the type, e.g. a matrix of complex values. Then the constructor of<br />

the matrix must call constructors of complex <strong>for</strong> each entry and the default constructor<br />

is the most convenient fashion to handle this.<br />

For some classes, it might be very difficult to define a default constructor, e.g. when some of<br />

the members are references. In those cases, it can be easier to accept the be<strong>for</strong>e-mentioned<br />

drawbacks instead of building badly designed default constructors.<br />

We can combine all three of them with default arguments:<br />

class complex<br />

{<br />

public:<br />

complex(double r= 0, double i= 0) : r(r), i(i) {}<br />

// ...<br />

};<br />

In the previous main function we defined two objects, one a copy of the other. We can write a<br />

constructor <strong>for</strong> this — called copy constructor:<br />

class complex<br />

{<br />

public:<br />

complex(const complex& c) : i(c.i), r(c.r) {}<br />

// ...<br />

};<br />

But we do not have to. C ++ is doing this itself. If we do not define a copy constructor, i.e. a<br />

construstor that has one argument and which is a const reference to its type, than the compiler<br />

creates this construstor implicitly. This automatically built copies each member variable by<br />

calling the variables’ copy constructors and this is exactly what we did. In cases like this where<br />

copying all members is precisely what you want <strong>for</strong> your copy constructor you should use the<br />

default <strong>for</strong> the following reasons:<br />

• It is less verbose;<br />

• It is less error-prone;<br />

• Other people know directly what your copy constructor does without reading your code;<br />

and


72 CHAPTER 3. CLASSES<br />

• Compilers might find more optimizations.<br />

There are cases where the default copy constructor does not work, especially when the class<br />

contains pointers. Say we have a simple vector class with a copy constructor:<br />

class vector<br />

{<br />

public:<br />

vector(const vector& v)<br />

: size(v.size), data(new double[size])<br />

{<br />

<strong>for</strong> (unsigned i= 0; i < size; i++)<br />

data[i]= v.data[i];<br />

}<br />

// ...<br />

private:<br />

unsigned size;<br />

double ∗data;<br />

};<br />

If we omit this copy constructor the compiler would not complain and voluntarily built one<br />

<strong>for</strong> us. We are glad that our program is shorter and sexier but sooner or later we find that it<br />

behaves bizarrely. Changing one vector, modifies another one as well and when we observe this<br />

strange behavior we have to find the error in our program. This is particularly difficult because<br />

there is no error in what we have written but in what we have omitted.<br />

Another problem we can observe is that the run-time library will complain that we freed the<br />

same memory twice. 4 The reason <strong>for</strong> this is the way pointers are copied. Only the address is<br />

copied and the result is that both pointers point to the same memory. This might be useful in<br />

some cases but most of the time it is not, at least in our domain. Some pointer-addicted geeks<br />

might see this differently.<br />

3.3.1 Explicit and implicit constructors<br />

In C ++ we distinguish implicit and explicit constructors. Implicit constructors enable in addition<br />

to object initialization implicit conversions and assignment-like notation <strong>for</strong> construction.<br />

Instead of:<br />

complex c1(3.0);<br />

we can also write:<br />

or<br />

complex c1= 3.0;<br />

complex c1= pi∗pi/6.0;<br />

This notation is <strong>for</strong> many scientifically educated people more readable. Older compilers might<br />

generate more code in initializations using ‘=’ (the object is first created with the default<br />

constructor and the value is copied afterwards) while current compiler generate the same code<br />

<strong>for</strong> both notations.<br />

4 This is an error message every programmer experiences at least once in his/her life (or he/she is not doing<br />

serious business).


3.3. CONSTRUCTORS 73<br />

The implicit conversion kicks in when one type is needed and another one is given, e.g. a double<br />

instead of a complex. Assume we have a function: 5<br />

double inline complex abs(complex c)<br />

{<br />

return std::sqrt(real(c) ∗ real(c) + imag(c) ∗ imag(c));<br />

}<br />

and call this with a double, e.g.:<br />

cout ≪ ”|7| = ” ≪ complex abs(7.0) ≪ ’\n’;<br />

The constant ‘7.0’ is considered as a double but there is no function ‘complex abs’ <strong>for</strong> double.<br />

There is a function <strong>for</strong> complex and complex has a constructor that accepts a double. So, the<br />

complex value is implicitly built from the double.<br />

This can be <strong>for</strong>bidden by declaring the constructor as ‘explicit’:<br />

class complex { public:<br />

explicit complex(double nr= 0.0, double i= 0.0) : r(nr), i(i) {}<br />

};<br />

Then complex abs would not be called with a double or any other type complex. To call this<br />

function with a double we can write an overload <strong>for</strong> double or construct a complex explicitly in<br />

the call:<br />

cout ≪ ”|7| = ” ≪ complex abs(complex(7.0)) ≪ ’\n’;<br />

The explicit attribute is really important <strong>for</strong> the vector class. There will be a constructor taken<br />

the size of the vector as argument:<br />

class vector<br />

{<br />

public:<br />

vector(int n) : my size(n), data(new double[my size]) {}<br />

};<br />

A function computing a scalar product will expect two vectors as arguments:<br />

double dot(const vector& v, const vector& w) { ... }<br />

Calling this function with integer arguments<br />

double d= dot(8, 8);<br />

will compile. What happened? Two temporary vectors of size 8 are created with the implicit<br />

constructor and passed to the function dot. This nonsense can be easily avoided by declaring<br />

the constructor explicit.<br />

Discussion 3.1 Which constructor shall be explicit is in the end the class designer’s decision.<br />

It is pretty obvious in the vector example: no right-minded programmer wants the compiler<br />

converting integers automatically into vectors.<br />

Whether the constructor of the complex class should be explicit depends on the expected utilization.<br />

Since a complex number with a zero imaginary part is mathematically identical with<br />

5 The definitions of real and imag will be given soon.


74 CHAPTER 3. CLASSES<br />

a real number, the implicit conversion does not create semantic inconsistencies. An implicit<br />

constructor is more convenient because doubles and double literals can be given whereever a<br />

complex is expected. Functions that are not per<strong>for</strong>mance-critical can be implemented only once<br />

<strong>for</strong> complex and used <strong>for</strong> double. Vice versa, in per<strong>for</strong>mance-critical applications it might be<br />

preferable using an explicit constructor because the compiler will refuse to call complex functions<br />

with double arguments. Then the programmer can implement overload of those functions with<br />

double arguments that do not waste run time on null imaginaries.<br />

That does not mean that high-per<strong>for</strong>mance implementations necessarily have to be realized with<br />

explicit constructors. The implicit conversion might happen in rarely called functions and the<br />

impact on the overall per<strong>for</strong>mance might be negligible. The compiler cannot tell us but a profiling<br />

tool can. A function that consumes less than 1 % of the execution time is not worth to spend<br />

much time on tuning it. All this considered, there are more reasons <strong>for</strong> an implicit constructor<br />

than <strong>for</strong> an explicit one and so it is implemented in std::complex.<br />

3.4 Destructors<br />

A destructor is a function that is called every time an object of this class is destroyed, <strong>for</strong><br />

example:<br />

∼complex()<br />

{<br />

std::cout ≪ ”So long and thanks <strong>for</strong> the fish.\n”;<br />

}<br />

Since the destructor is the complementary operation of the default constructor it uses the<br />

complementary notation in the signature. Opposed to the constructor there is only one single<br />

overload and arguments are not allowed — what could they are good <strong>for</strong> anyway, as grave<br />

goods? There is no live after death in C ++.<br />

In our example, there is nothing to do when a complex number is destroyed and we can omit<br />

the destructor. A destructor is needed when the object acquired resources, e.g. memory. In<br />

this cases the memory must be freed in the destructor and the other ressource be released.<br />

class vector<br />

{<br />

public:<br />

// ...<br />

∼vector()<br />

{<br />

if (data) // check if pointer was allocated<br />

delete[] data;<br />

}<br />

// ...<br />

private:<br />

unsigned my size;<br />

double ∗data;<br />

};<br />

Files that are opened with std::ifstream or std::ofstream does not need to closed explicitly, their<br />

destructors will do this if necessary. Files that are opened with old C handles require explicit<br />

closing and this is only one reason <strong>for</strong> not using them.


3.5. ASSIGNMENT 75<br />

It must be paid attention that the freed ressources are not used or released somewhere else in<br />

the program afterwards. C ++ generates a default destructor in the same way as the default<br />

constructor: calling the destructor of each member but in the reverse order. 6<br />

3.5 Assignment<br />

Assignment operators are used to enable <strong>for</strong> user-defined types expressions like:<br />

x= y;<br />

u= v= w= x;<br />

As usual we consider first the class complex. Assigning a complex to a complex requires an<br />

operator like:<br />

complex& operator=(const complex& src)<br />

{<br />

r= src.r; i= src.i;<br />

return ∗this;<br />

}<br />

Evidently, we copy the members ‘r’ and ‘i’. The operator returns a reference to the object<br />

<strong>for</strong> enabling multiple assignments. ‘this’ is a pointer to the object itself and since we need a<br />

reference <strong>for</strong> syntactic reasons it is dereferred. What happens if we assign a double?<br />

c= 7.5;<br />

It compiles without the definition of an assignment operator <strong>for</strong> double. Once again, we have a<br />

implicit conversion: the implicit constructor creates a complex on the fly and assigns this one.<br />

If this becomes a per<strong>for</strong>mance issue we can add an assignment <strong>for</strong> double:<br />

complex& operator=(double nr)<br />

{<br />

r= nr; i= 0;<br />

return ∗this;<br />

}<br />

An assignment operator like the first one that assigns a an object of the same type is called<br />

Copy Assignment and this operator is synthesized by the compiler. In the case of complex<br />

numbers the generated copy assignment operator per<strong>for</strong>ms exactly what we need, copying all<br />

members.<br />

As <strong>for</strong> the vector the synthesized operator is not satisfying because it only copies the address<br />

of the data and not the data itself. The implementation is very similar to the copy constructor:<br />

vector& operator=(const vector& src)<br />

{<br />

if (this == &src)<br />

return ∗this;<br />

assert(my size == src.my size);<br />

<strong>for</strong> (int i= 0; i < my size; i++)<br />

data[i]= src.data[i];<br />

6 TODO: Good and short explanation why. If possible with example.


76 CHAPTER 3. CLASSES<br />

}<br />

return ∗this;<br />

In fact every class implementation where the copy assignment and the copy constructor have<br />

essential differences in their implementation are very confusing in their behavior and should not<br />

be used, cf. [SA05, p. 94]. The two operations differ in the respect that a constructor creates<br />

content in a new object while an assignment replaces content in an existing object. However,<br />

both the creation as well as the replacement is per<strong>for</strong>med with a copy semantics and the two<br />

operations should behave consistently there<strong>for</strong>e.<br />

An assignment of an object to itself (source and target have the same address) can be skipped,<br />

line 3 and 4. In line 5 it is tested whether the assignment is a legal operation by checking<br />

the equality of their size. Alternatively the assignment could resize the target if the sizes are<br />

different but that does not correspond to the authors’ understanding of vector behavior — or<br />

can you think of a context in mathematics or physics where a vector space all of a sudden<br />

changes its dimension.<br />

3.6 Automatically Generated Operators<br />

If you define a class without operators C ++ will generate the following four:<br />

• Default constructor;<br />

• Copy constructor;<br />

• Destructor; and<br />

• Copy assignment.<br />

Assume you have a class without any function but with some member variables like this:<br />

class my class<br />

{<br />

type1 var1;<br />

type2 var2;<br />

// ...<br />

typen varn;<br />

};<br />

Then the compiler adds the four operators and your class behaves as you would have written:<br />

class my class<br />

{<br />

public:<br />

my class()<br />

: var1(),<br />

var2(),<br />

// ...<br />

varn()<br />

{}<br />

my class(const my class& that)<br />

: var1(that.var1),<br />

var2(that.var2),


3.6. AUTOMATICALLY GENERATED OPERATORS 77<br />

{}<br />

//...<br />

varn(that.varn)<br />

∼my class()<br />

{<br />

varn.∼typen();<br />

// ...<br />

var2.∼type2();<br />

var1.∼type1();<br />

}<br />

my class& operator=(const my class& that)<br />

{<br />

var1= that.var1;<br />

var2= that.var2;<br />

// ...<br />

varn= that.varn;<br />

return ∗this;<br />

}<br />

private:<br />

type1 var1;<br />

type2 var2;<br />

// ...<br />

typen varn;<br />

};<br />

The generation is straight <strong>for</strong>ward. The four operators are respectively called on each member<br />

variable. The careful reader has realized that the constructors and the assignment is per<strong>for</strong>med<br />

in the exact order as the variables are defined. The destructors are called in reverse order.<br />

The generation of these operators will be disabled if you define your own. The rules <strong>for</strong> this<br />

are quite simple. The simplest is <strong>for</strong> the destructor: either you define it or the compiler does.<br />

There is only one destructor (because it has no arguments). The default constructor generation<br />

is disabled when any constructor is defined by the user — even a private constructor.<br />

The copy constructor and copy assignment operator are generated automatically unless there<br />

is a user-defined version <strong>for</strong> the class type or a reference of it. In detail, if the user defines one<br />

or two of the following:<br />

• return type operator=(my class that);<br />

• return type operator=(const my class& that); or<br />

• return type operator=(my class& that);<br />

Then the compiler does not generated it. Typically, one defines only the second operator<br />

because the first one causes an extra copy 7 and the last one requires mutability what is usually<br />

not necessary <strong>for</strong> the assignment. The copy constructor can only be defined <strong>for</strong> references<br />

because it need itself <strong>for</strong> passing a value as argument. Defining a constructor or assignment <strong>for</strong><br />

any other type does not disable the generation of the copy operators.<br />

7 An exception is user-defined move semantics. 8


78 CHAPTER 3. CLASSES<br />

This mechanism applies recursively. For instance, if type1 is itself a class with an automatically<br />

generated default constructor the default constructors of its members are called in the order<br />

of their definition. Of those variables or some of them are also classes then their default<br />

constructors are called and so <strong>for</strong>th. If the type of a member variable is an intrinsic type like int<br />

or float then there are evidently no such operators because the types are no classes. However,<br />

the behavior can be easily emulated: the “default constructor” just creates it with a random<br />

value (whatever bits where set on the according memory position be<strong>for</strong>e determine its value),<br />

the “copy constructor” and the “copy assignment” copy the values and the “destructor” does<br />

nothing.<br />

3.7 Accessing object members<br />

3.7.1 Access functions<br />

In § 3.2.2 we introduced getters and setters to access the variables of the class complex. This<br />

becomes cumbersome when we want <strong>for</strong> instance increment the real part:<br />

c.set r(c.get r() + 5.);<br />

This does not really look like numeric operations and is not very readable either. A better way<br />

dealing with this is writing a member function that returns a reference:<br />

class complex { public:<br />

double& real() { return r; }<br />

};<br />

With this function we can write:<br />

c.real()+= 5.;<br />

This looks already much better but still a little bit weird. Why not incrementing like this:<br />

real(c)+= 5.;<br />

To do this, we write a free function:<br />

inline double& real(complex& c) { return c.r; }<br />

But this function access the private member ‘r’. We can modify the free function calling the<br />

member function:<br />

inline double& real(complex& c) { return c.real(); }<br />

Or alternatively declaring the free function as friend of complex:<br />

class complex { public:<br />

friend double& real(complex& c);<br />

};<br />

Functions or classes that are friends can access private and protected data. A strange issue<br />

with this free function is that the inline attribute must be written be<strong>for</strong>e the reference type.<br />

Usually it does not matter whether the inline is written be<strong>for</strong>e or after the return type. 9<br />

9 TODO: Anybody a decent explanation <strong>for</strong> this?


3.7. ACCESSING OBJECT MEMBERS 79<br />

This function works only the complex number is not constant. So we also need a function that<br />

takes a constant reference as argument. In return it can only provide a constant reference of<br />

the number’s real part.<br />

inline const double& real(const complex& c) { return c.r; }<br />

This function requires a friend declaration, too.<br />

The functions — in free as well as in member <strong>for</strong>m — can evidently only be called when object<br />

is created. The references of the number’s real part that we use in the statement<br />

real(c)+= 5.;<br />

exist only until the end of the statement. The variable c lives longer. We can create a reference<br />

variable:<br />

double &rr= real(c);<br />

C ++ destroys objects in reverse order. That means that even if rr and c are in the same function<br />

or block, c lives longer than rr.<br />

The same is true <strong>for</strong> constant references if objects from variable declarations are referred.<br />

Temporary objects can also be passed as constant references enabling the definition of outdated<br />

references:<br />

const double &rr= real(complex()); // Bad thing!!!<br />

cout ≪ ”The real part is ” ≪ rr ≪ ’\n’;<br />

The complex variable is created temporarily and only exist until the end of the first statement.<br />

The reference to its real part lives till the end of the surrounding block.<br />

Advise<br />

Do Not Make Constant References Of Temporary Expressions!<br />

They are invalid be<strong>for</strong>e you use them the first time.<br />

3.7.2 Subscript operator<br />

A really stupid way to access vector entries would be writing a function <strong>for</strong> each one:<br />

class vector<br />

{<br />

public:<br />

double& zeroth() { return data[0]; }<br />

double& first() { return data[1]; }<br />

double& second() { return data[2]; }<br />

// ...<br />

int size() const { return my size; }<br />

};


80 CHAPTER 3. CLASSES<br />

One could not even write a loop over all elements.<br />

To enable such iteration, we need a function like:<br />

class vector<br />

{<br />

public:<br />

double at(int i)<br />

{<br />

assert(i >= 0 && i < my size);<br />

return data[i];<br />

}<br />

};<br />

Summing the entries of vector v reads:<br />

double sum= 0.0;<br />

<strong>for</strong> (int i= 0; i < v.size(); i++)<br />

sum+= v.at(i);<br />

C ++ and C access entries of (fixed-size) arrays with the subscript operator. It is, thus, only<br />

natural doing the same <strong>for</strong> (dynamically sized) vectors. Then we could rewrite the previous<br />

example as:<br />

double sum= 0.0;<br />

<strong>for</strong> (int i= 0; i < v.size(); i++)<br />

sum+= v[i];<br />

This is more concise and shows more clearly what we are doing.<br />

The operator overloading has the same syntax as the assignment operator and the implementation<br />

from function at:<br />

class vector<br />

{<br />

public:<br />

double& operator[](int i)<br />

{<br />

assert(i >= 0 && i < my size);<br />

return data[i];<br />

}<br />

};<br />

With this operator we can access vector elements with brackets but only if the vector is mutable<br />

vectors.<br />

3.7.3 Constant member functions<br />

This raises the more general question: How can we write operators and member functions that<br />

accept constant objects? In fact, operators are a special <strong>for</strong>m of member functions and can be<br />

called like a member function:<br />

v[i]; // is syntactic sugar <strong>for</strong>:<br />

v.operator[](i);


3.7. ACCESSING OBJECT MEMBERS 81<br />

Of course, the long <strong>for</strong>m is almost never called but it illustrates that operators are regular<br />

functions that only provide an extra syntax to call them.<br />

Free functions allow qualifying the const-ness of each argument. Member functions do not even<br />

mention the processed object in the signature. How const-ness can be specified then? There is<br />

a special notation that notates the applicability of a member function to constant objects after<br />

the function header, e.g. our subscript operator:<br />

class vector<br />

{<br />

public:<br />

const double& operator[](int i) const<br />

{<br />

assert(i >= 0 && i < my size);<br />

return data[i];<br />

}<br />

};<br />

The const attribute is not just a casual gesture of the programmer that he/she does not mind<br />

calling this member function with a constant object. C ++ takes this constancy very seriously<br />

and will verify that the function does not modify the object, i.e. some of its members, that the<br />

object is only passed as const when free functions are called and that called member functions<br />

have the const attribute as well.<br />

This constancy guarantee also impedes returning non-constant pointers or references. One can<br />

return constant pointers or references as well as objects. A returned object does not need to<br />

be constant (but it could) because it is a copy of the object, of one of its member variables<br />

(or constants), or of a temporary variable; and because it is a copy the object is guaranteed to<br />

remain unchanged.<br />

Constant member functions can be called <strong>for</strong> non-constant objects (because C ++ implicitly<br />

converts non-constant references into constant references when necessary). There<strong>for</strong>e, it is<br />

often sufficient to provide only the constant member function. For instance a function that<br />

returns the size of the vector:<br />

class vector<br />

{<br />

public:<br />

int size() const { return my size; }<br />

// int size() { return my size; } // futile<br />

};<br />

The non-constant size function does the same as the constant one and is there<strong>for</strong>e useless.<br />

For our subscript operator we need both the constant and the mutable version. If we only<br />

had the constant member function, we could use it to read the elements of both constant and<br />

mutable vectors but we could not modify the elements. By the way, our abandonned getters<br />

should have been const since they are only used to read values regardless of whether the object<br />

is constant or mutable.<br />

3.7.4 Accessing multi-dimensional arrays<br />

Let us assume that we have a simple matrix class like the following:


82 CHAPTER 3. CLASSES<br />

class matrix<br />

{<br />

public:<br />

matrix() : nrows(0), ncols(0), data(0) {}<br />

matrix(int nrows, int ncols)<br />

: nrows(nrows), ncols(ncols), data( new double[nrows ∗ ncols] ) {}<br />

matrix(const matrix& that)<br />

: nrows(that.nrows), ncols(that.ncols), data(new double[nrows ∗ ncols])<br />

{<br />

<strong>for</strong> (int i= 0, size= nrows∗ncols; i < size; ++i)<br />

data[i]= that.data[i];<br />

}<br />

∼matrix() { if (data) delete [] data; }<br />

void operator=(const matrix& that)<br />

{<br />

assert(nrows == that.nrows && ncols == that.ncols);<br />

<strong>for</strong> (int i= 0, size= nrows∗ncols; i < size; ++i)<br />

data[i]= that.data[i];<br />

}<br />

int num rows() const { return nrows; }<br />

int num cols() const { return ncols; }<br />

private:<br />

int nrows, ncols;<br />

double∗ data;<br />

};<br />

So far, the implementation is done in the same manner as be<strong>for</strong>e: variables are private, the<br />

constructors establish defined values <strong>for</strong> all members, the copy constructor and the assignment<br />

are consistent, size in<strong>for</strong>mation are provided by a constant function.<br />

What is still missing is the access to the matrix entries.<br />

Be aware!<br />

The bracket operator accepts only one argument.<br />

That means we cannot define<br />

double& operator[](int r, int c) { ... }<br />

Approach 1: Parenthesis<br />

The simplest way handling multiple indices is replacing the square brackets with parentheses:<br />

double& operator()(int r, int c)


3.7. ACCESSING OBJECT MEMBERS 83<br />

{<br />

}<br />

return data[r∗ncols + c];<br />

Adding range checking — in a separate function <strong>for</strong> better reuse — can safe us a lot of debug<br />

time in the future. We also implement the constant access:<br />

private:<br />

void check(int r, int c) const { assert(0


84 CHAPTER 3. CLASSES<br />

Approach 3: Returning proxies<br />

Instead of returning a pointer we can build a specific type that keeps a reference to the matrix<br />

and the row index and that provide an operator[] <strong>for</strong> accessing matrix entries. This proxy must<br />

be there<strong>for</strong>e a friend of the matrix class to reach its private data. Alternatively, we can keep<br />

the operator with the parentheses and call this one from the proxy. In both cases, we encounter<br />

cyclic dependencies. 10<br />

If we have several matrix types, each of them would need its own proxy. We would also need<br />

different proxies <strong>for</strong> constant and mutable access respectively. In Section 6.5 we will show how<br />

to write a proxy that works <strong>for</strong> all matrix types. The same templated proxy will handle constant<br />

and mutable access. Fortunately, it even solves the problem of mutual dependencies. The only<br />

minor flaw is that eventual errors cause lenghty compiler messages.<br />

Approach 4: Multi-index type (advanced)<br />

Preliminary note: this approach contains several new language features and discusses some<br />

subtle details. If you do not understand the first time, don’t worry. If you like to skip it, do<br />

it. That will not be a problem <strong>for</strong> understanding the rest of the book. But please read the<br />

comparing discussion.<br />

The fact that operator[] accepts only one argument does not necessarily mean that we cannot<br />

give two. But we need a tricky technique to build one object out of two, without explicitly<br />

constructing the object. The implementation is based on the matrix example from an onlinetutorial<br />

[Sch].<br />

First, we define a type:<br />

struct double index<br />

{<br />

double index (int i1, int i2): i1 (i1), i2 (i2) {}<br />

int i1, i2;<br />

};<br />

For this type we define the access operator:<br />

double& operator[](double index i) { return data[i.i1∗ncols + i.i2]; }<br />

const double& operator[](double index i) const { return data[i.i1∗ncols + i.i2]; }<br />

Now we can write:<br />

A[double index(1, 0)];<br />

This works but it was not the concise notation we were looking <strong>for</strong>.<br />

We introduce a second type:<br />

struct single index<br />

{<br />

single index (int i1): i1 (i1) {}<br />

double index operator, (single index j) const {<br />

10 The dependencies cannot be resolved with <strong>for</strong>ward declaration because we not only define references or<br />

pointers but call member functions in the matrix and in the proxy. We will explain this in § ??.


3.7. ACCESSING OBJECT MEMBERS 85<br />

};<br />

}<br />

return double index (i1, j.i1);<br />

operator int() const { return i1; }<br />

single index& operator++ () {<br />

++i1; return ∗this;<br />

}<br />

int i1;<br />

This new type overloaded the comma operator so that a second index creates a double index.<br />

The constructor is implicit and the class contains an operator to int. This enables the compiler<br />

to switch between single index and int in both ways.<br />

This allows us to write code like:<br />

or<br />

single index i= 0, j= 1;<br />

std::cout ≪ ”A[0, 1] is ” ≪ A[i, j] ≪ ’\n’;<br />

<strong>for</strong> (single index i= 0; i < A.num rows(); ++i)<br />

<strong>for</strong> (single index j= 0; j < A.num cols(); ++j)<br />

std::cout ≪ ”A[” ≪ i ≪ ”, ” ≪ j ≪ ”] is ” ≪ A[i, j] ≪ ’\n’;<br />

In the loop, an single index (i) is compared with an int (A.num rows()). This comparison operator<br />

is not defined. The compiler converts i implicitly to an int and compares the values as int.<br />

Thus, the conversion operator allows us to use all operations that are defined <strong>for</strong> int without<br />

implementing them.<br />

At this opportunity we can introduce another operator. C and C ++ provide a prefix and postfix<br />

increment/decrement. The difference only manifests if we read the incremented/decremented<br />

value, e.g., j= i++; is differs from j= ++i; by having the old value of i in j (in the first statement)<br />

or the already incremented i (in the second statement). If the increment is the only expression<br />

in the statement, e.g., i++; or ++i;, there is no semantic difference. There<strong>for</strong>e, it does not<br />

matter <strong>for</strong> loops whether we use the postfix or prefix notation.<br />

<strong>for</strong> (single index i= 0; i < A.num rows(); ++i)<br />

is (semantically) equivalent to:<br />

<strong>for</strong> (single index i= 0; i < A.num rows(); i++)<br />

For C ++ integer types it really does not matter. For user-defined types, the compiler will tell<br />

us that this operation is not defined. The GNU Compiler emits the following error message:<br />

no ≫operator++(int)≪ <strong>for</strong> suffix ≫++≪ declared, instead prefix operator tried<br />

Fortunately, it already reveals the solution.<br />

The operator++ without arguments is understood as prefix operator. To define a postfix operator<br />

we must define it with a dummy int argument. This argument has no effect but we need<br />

a way to define the symbol ++ as prefix and postfix operator. Unary operators are defined<br />

as member functions without argument. This works <strong>for</strong> all other unary operators but in case


86 CHAPTER 3. CLASSES<br />

of the decrement/increment we have the same symbol <strong>for</strong> two operators respectively that are<br />

distinguished by the position.<br />

To make a long story short, if we write i++ we must define the postfix increment:<br />

single index operator++ (int)<br />

{<br />

single index tmp(∗this);<br />

++i1;<br />

return tmp;<br />

}<br />

We see that the operation requires an extra copy. The object itself must be incremented but<br />

the returned valued must be still the old one. If we returned the object itself, i.e. ∗this, then we<br />

had no possibility to increment it after the return. There<strong>for</strong>e we need a copy be<strong>for</strong>e we modify<br />

the object. Alternatively we could omit the copy and return a new object with the old value:<br />

single index operator++ (int)<br />

{<br />

++i1;<br />

return single index(i1 − 1);<br />

}<br />

This avoids the copy at the beginning but we still create a new object. These implementations<br />

show that the postfix operators are somewhat more expensive than prefix operators; and this<br />

true <strong>for</strong> all user-defined types. For C ++-own types the compiler can generate efficient executables<br />

<strong>for</strong> both <strong>for</strong>ms.<br />

The really sad part of the story is that we put so much ef<strong>for</strong>t returning the old value of our<br />

index and does not even use it. There<strong>for</strong>e, we give the following<br />

Advise<br />

If you increment or decrement user-defined types prefer the prefix notation,<br />

especially if the value of the changed variable is not used in the statement.<br />

In the examples, we declared both indices as single index. It is sufficient doing this <strong>for</strong> the first<br />

one and let the implicit constructor convert the second one:<br />

A[single index(0), 1]<br />

Un<strong>for</strong>tunately, we cannot write<br />

A[0, 1]<br />

The compiler will give an error message 11 like:<br />

no match <strong>for</strong> ≫operator[]≪ in ≫A[(0, 0)]≪<br />

To call operator[], the compiler would need to per<strong>for</strong>m multiple steps that depend on each other:<br />

first the zeros that are considered int would need to be converted to single index and then the<br />

11 This is the message from GNU compiler.


3.7. ACCESSING OBJECT MEMBERS 87<br />

comma operator has to be applied on them. A language that would allow such dependent<br />

conversions would end up in extremely long compile times to considered all possibilities 12 and<br />

probability of ambiguities would increase tremendously.<br />

Instead the compiler considers ‘0, 0’ as a sequence of two expressions where each expression is<br />

an integer constant. The result of a sequence is the result of the last expression, i.e. the integer<br />

constant zero in our case. This cannot be converted into a double index.<br />

To throw in a really bad idea, we give the second constructor argument of double index a default<br />

value:<br />

struct double index<br />

{<br />

double index (int i1, int i2= 0) // Very bad<br />

: i1 (i1), i2 (i2) {}<br />

int i1, i2;<br />

};<br />

Then the expression A[0, 1] compiles, as well as A[0, 1, 2, 3, 4]. It evaluates the integer sequence<br />

and the result is the last expression. A single integer can be implicitly converted into double index.<br />

As a result, the last integer is considered the row and the column is zero.<br />

Comparing the approaches<br />

The previous implementations show that C ++ allows us to provide different notations <strong>for</strong> userdefined<br />

types and we can implement it in the manner that seems most appropriate to us. The<br />

first approach was replacing square brackets by round parentheses to enable multiple arguments.<br />

This was the simplest solution and if one is willing to accept this syntax, one can safe oneself<br />

the length we went through to come up with a fancier notation. The technique of returning a<br />

pointer was not complicated either but it relies to strongly on the internal representation. If<br />

we use some internal blocking or some other specialized internal storage scheme, we will need<br />

an entirely different technique. Another drawback was that we cannot test the range of the<br />

column index.<br />

The last approach, introduced special types and the fact that we must always specify the type<br />

of the index explicitly makes the notation <strong>for</strong> constant indices clumsier instead of clearer. It<br />

also introduced a lot of implicit conversions and in a large code base we might have enormous<br />

trouble to avoid ambiguities. Another un<strong>for</strong>tunate aspect is the overloading of the comma<br />

operator. It makes the understanding of programs more difficult — because one has to pay a<br />

lot of attention to the types of expressions to distinguish it from non-overloaded sequences —<br />

and can cause weird affects. Thus, our first recommendation is keep reading since the proxy<br />

solution in § ?? is in our opinion preferable to the previous approaches (although not perfect<br />

either).<br />

Resuming, C ++ gives us the opportunity to handle programming tasks in different ways. Several<br />

times, none of the solutions will be perfect. Even if oneself is satisfied with the solution,<br />

then there will be most certainly some (allegedly) experienced C ++ programmer who finds a<br />

disadvantage.<br />

12 It might even become undecidable.


88 CHAPTER 3. CLASSES<br />

There are two lessons we can learn from this, firstly:<br />

Advise<br />

Don’t push C ++ too far! Avoid fragile features and minimize implicit conversions.<br />

C ++ enables many techniques but that doesn’t mean one have to use them all. Especially the<br />

comma operator bears so much danger that its utilization must be limited to very rare cases<br />

or better avoided entirely. It is important to have an appropriate notation and time spent on<br />

syntactic sugar is really worthwhile <strong>for</strong> the sake of better usability of new classes. But some<br />

tricks provide a little improvement in the syntax and create large problems in the interplay with<br />

other techniques.<br />

Secondly:<br />

Advise<br />

If you can’t find a perfect solution, pick what serves you best and accept it.<br />

We dare the hypothesis that there is no single C ++ program that everybody is happy with. The<br />

attempt to come up with the world’s first perfect C ++ program will end in failure and bitterness.<br />

Of course that does not mean always willingly accepting the first working implementation one<br />

comes up with. Software always can be improved and should be. As mentions in § 2.11,<br />

experiences have shown that is most efficient to refactor software as early as possible than<br />

retroactively fixing issues when important applications crash, users are angry and the program<br />

author(s) <strong>for</strong>got the details or are already gone. On the other hand, by the time one reaches<br />

a really good implementation one has certainly spent already much more time than initially<br />

planned.<br />

3.8 Other Operators


Generic programming<br />

Chapter 4<br />

In this chapter we will explain the use of templates in C ++ to create generic functions and<br />

classes. We will also discuss metaprogramming and the Standard Template Library.<br />

4.1 Templates<br />

Templates are a feature of the C ++ programming language that create functions and classes<br />

that operate with generic types — also called parametric types. As a result, a function or class<br />

can work with many different data types without being manually rewritten <strong>for</strong> each one.<br />

A template parameter is a special kind of parameter that can be used to pass a type as an<br />

argument: just like regular function parameters can be used to pass values to a function,<br />

template parameters allow to pass also types to a function or a class. These generic functions<br />

can use these parameters as if they were any other regular type.<br />

4.2 Generic functions<br />

Generic functions — also called function templates — are in some sort generalizations of overloaded<br />

functions.<br />

Suppose we want to write the function max(x,y) where x and y are variables or expressions of<br />

some type. Using overloading, we can easily do this as follows:<br />

int inline max (int a, int b)<br />

{<br />

if (a > b)<br />

return a;<br />

else<br />

return b;<br />

}<br />

double inline max (double a, double b)<br />

{<br />

if (a > b)<br />

return a;<br />

89


90 CHAPTER 4. GENERIC PROGRAMMING<br />

}<br />

else<br />

return b;<br />

Note that the function body is exactly the same <strong>for</strong> both int and double.<br />

With the template mechanism we can write just one generic implementation:<br />

template <br />

T inline max (T a, T b)<br />

{<br />

if (a > b)<br />

return a;<br />

else<br />

return b;<br />

}<br />

The function can be used in the same way as the overloaded functions:<br />

std::cout ≪ ”The maximum of 3 and 5 is ” ≪ max(3, 5) ≪ ’\n’;<br />

std::cout ≪ ”The maximum of 3l and 5l is ” ≪ max(3l, 5l) ≪ ’\n’;<br />

std::cout ≪ ”The maximum of 3.0 and 5.0 is ” ≪ max(3.0, 5.0) ≪ ’\n’;<br />

In the first case, ‘3’ and ‘5’ are literals of type int and the max function is instantiated to<br />

int inline max (int, int);<br />

Likewise the second and third call of max instantiate<br />

long inline max (long, long);<br />

double inline max (double, double);<br />

as the literals are interpreted as long and double.<br />

In the same way the template function can be called with variables and expressions:<br />

unsigned u1= 2, u2= 8;<br />

std::cout ≪ ”The maximum of u1 and u2 is ” ≪ max(u1, u2) ≪ ’\n’;<br />

std::cout ≪ ”The maximum of u1∗u2 and u1+u2 is ” ≪ max(u1∗u2, u1+u2) ≪ ’\n’;<br />

Here the function is instantiated <strong>for</strong> short.<br />

Instead of typename one can also write class in this context but we do not recommend this<br />

because typename expresses better the intention of a generic function.<br />

What does instantiation mean? When you write a non-generic function, the compiler reads<br />

its definition, checks <strong>for</strong> errors, and generates executable code. When the compiler processes<br />

a generic function’s definition it only checks certain errors (parsing errors) and generates no<br />

executable code. For instance:<br />

template <br />

T inline max (T a, T b)<br />

{<br />

if a > b // Error !<br />

return a;<br />

else<br />

return b;<br />

}


4.2. GENERIC FUNCTIONS 91<br />

would not compile because the if statement without the parentheses is not a legal expression of<br />

the C ++ grammar. Meanwhile the following stupid implementation:<br />

template <br />

T inline max (T a, T b)<br />

{<br />

if (a > b)<br />

return max(a, b); // Infinite loop !<br />

else<br />

return max(b, a); // Infinite loop !<br />

}<br />

compiles because it does not violate any grammar rule. It obviously results in an infinite loop<br />

but this is beyond the compiler’s responsibility.<br />

So far, the compiler only checked the grammatical correctness of the definition but did not<br />

generate code. If we do not call the template function, the binary will have no trace of our max<br />

function. What happens when we call the generic function and cause their instantiation. The<br />

compiler first checks if the function can be compiled with the given argument type. It can do<br />

it <strong>for</strong> int or double as we have seen be<strong>for</strong>e. What about types that have no ‘>’? For instance<br />

std::complex. Let us try to compile:<br />

std::complex z(3, 2), c(4, 8);<br />

std::cout ≪ ”The maximum of c and z is ” ≪ ::max(c, z) ≪ ’\n’;<br />

The double colons in front of max shall avoid ambiguities with the standard libraries max that<br />

some compilers may include implicitly (as g++ apparently). Our compilation attempt will end<br />

in error like:<br />

Error: no match <strong>for</strong> ≫operator>≪ in ≫a > b≪<br />

Obviously, we cannot call the max function with types that have no “greater than” operator.<br />

In fact, there is no maximum function <strong>for</strong> complex numbers.<br />

What happens when our template function calls another template function which in turn . . . ?<br />

Likewise, these functions are only completely checked at instantiation time. Let us look at the<br />

following program:<br />

#include <br />

#include <br />

#include <br />

#include <br />

int main ()<br />

{<br />

using namespace std;<br />

vector v;<br />

sort(v.begin(), v.end());<br />

}<br />

return 0 ;<br />

Without going into detail, the problem is the same as be<strong>for</strong>e: we cannot compare complex<br />

numbers and thus not sort arrays of it. This time the missing comparison is discovered in<br />

an indirectly called function and the compiler provides you the entire call stack so that you


92 CHAPTER 4. GENERIC PROGRAMMING<br />

can trace back the error. Please try to compile this example on different compilers at your<br />

availability and see if you can make any sense out of the error messages.<br />

If you run into such lengthy error message 1 DON’T PANIC! First, look at the error itself<br />

and take out what is useful <strong>for</strong> you, e.g. missing “operator>” or something not assignable,<br />

i.e. missing “operator=” or something const that should not. Then find in the call stack your<br />

innermost code that is the part of your program where you call somebody else’s template<br />

function. Stare <strong>for</strong> a while at this and its preceding lines because this is the most likely place<br />

where the error is made. Does a type of the template function function’s argument is missing<br />

an operator or function as mentioned in the error? Do not get scared away from this, often<br />

the problem is much simpler than it seems from the never-ending error message. From our<br />

experience, most errors in template functions one can find faster than run-time errors.<br />

Another question we have not answered so far is what happens if we use two different types:<br />

unsigned u1= 2;<br />

int i= 3;<br />

std::cout ≪ ”The maximum of u1 and i is ” ≪ max(u1, i) ≪ ’\n’;<br />

The compiler tell us — this time briefly — something like<br />

Error: no match <strong>for</strong> function call ≫max(unsigned int&, int)≪<br />

Indeed, we assumed that both types are the same. Now can we write a template function with<br />

two template parameters? Of course, we can. But that does not help us much here because we<br />

would not know what return type the function had.<br />

There are different options. First we could add a non-templated function like:<br />

int inline max (int a, int b) { return a > b ? a : b; }<br />

This can be called with mixed types and the unsigned argument would be implicitly converted<br />

into an int. But what would happen if we also add a function <strong>for</strong> unsigned?<br />

int max(unsigned a, unsigned b) { return a > b ? a : b; }<br />

Shall the int be converted into an unsigned or vice versa? The compiler does not know and will<br />

complain about this ambibuity.<br />

At any rate, adding non-templated overloads to the templated implemention is far from being<br />

elegant nor productive. So, we remove all non-templated overloads and look what we can do in<br />

the function call. We can explicitly convert one argument to the type of the other:<br />

unsigned u1= 2;<br />

int i= 3;<br />

std::cout ≪ ”The maximum of u1 and i is ” ≪ max(int(u1), i) ≪ ’\n’;<br />

Now max is called with two ints. Another option is specifying the template type explicitly in<br />

the function call:<br />

unsigned u1= 2;<br />

int i= 3;<br />

std::cout ≪ ”The maximum of u1 and i is ” ≪ max(u1, i) ≪ ’\n’;<br />

1 The longest we have heard off was 18MB what corresponds to about 9000 pages of text.


4.2. GENERIC FUNCTIONS 93<br />

Then the arguments are converted to int. 2<br />

After these less pleasant details on templates one really good news: template functions per<strong>for</strong>m<br />

as efficient as their non-templated counterpart! The reason is that C ++ generates new code<br />

<strong>for</strong> every type or type combination that the function is called with. Java in contrast compiles<br />

templates only once and executes them <strong>for</strong> different types by casting them to the corresponding<br />

types. This results in faster compilation and shorter executables but it is less efficient than<br />

non-templated implementations (which are already less efficient than C ++ programs).<br />

Another price we have to pay <strong>for</strong> the fast templates is that we have longer executables because<br />

of the multiple instantiations <strong>for</strong> each type (combination). However, in practice the number of<br />

a function’s instances will not be that large and it only really matters <strong>for</strong> non-inline functions<br />

with long implementations (including called template functions). Inline functions’ binary codes<br />

are at any rate inserted directly in the exutable at the location of the function call so that the<br />

impact on the executable length is the same <strong>for</strong> template and non-template functions.<br />

4.2.1 The function accumulate<br />

TODO: An example on containers is much better than with ugly pointer arithmetic.<br />

Consider an array double a[n] which is described by its begin and end pointers a and a + n<br />

respectively. 3<br />

We create a function <strong>for</strong> the sum of an array of doubles. The loop over the array uses pointers<br />

as was explained in Section 2.9. Figure 4.1 shows the positions of the begin pointer a and the<br />

end pointer a+n that is directly past the end of the array.<br />

a<br />

❄<br />

✲<br />

a + n<br />

Figure 4.1: An array of length n with begin and end pointers<br />

Thus, we specify the range of entries by an right-open interval of adresses.<br />

2<br />

For complicated reasons of compiler internals the explicit type parameter turns off argument-dependent<br />

name lookup (ADL).<br />

3<br />

An array and a pointer are treated in much the same way in C/C ++. So one can pass an array when a<br />

pointer is expected and it takes the address of the first entry &a[0]. a + n is <strong>for</strong> a pointer or array a and an<br />

integer n equivalent to &a[n].<br />


94 CHAPTER 4. GENERIC PROGRAMMING<br />

Advise<br />

Unless you have strong reasons against it, use right-open intervals because:<br />

• It is easy to represent empty sets by two equal locations (pointers,<br />

iterators, . . . ).<br />

• It works on types without an ordering: if you specify the end by the<br />

locaation of the last element you need an operator


4.3. GENERIC CLASSES 95<br />

+= operator to variables of type T. This operator is defined <strong>for</strong> int and double types. This<br />

implies that the following main program will compile without the need <strong>for</strong> another definition of<br />

the accumulate function:<br />

int main()<br />

{<br />

const int n = 10;<br />

float a[n] ;<br />

int b[n] ;<br />

<strong>for</strong> (int i= 0; i < n; ++i) {<br />

a[i]= float(i) + 1.0f;<br />

b[i]= i + 1;<br />

}<br />

float s= accumulate(a, a + n);<br />

int r= accumulate(b, b + n);<br />

return 0;<br />

}<br />

As well as in the previous example we do not need to indicate explicitly that T is double or int.<br />

The compiler deduces this <strong>for</strong> us from the function. We can, however, fill in the correct value<br />

of the type as follows:<br />

int r = accumulate(b, b+n);<br />

If you fill in the wrong type the compiler will give you a type error by saying that no matching<br />

function exists.<br />

4.3 Generic classes<br />

In the previous section, we described the use of templates to create generic functions. Templates<br />

can also be used to create generic classes, that define a certain behaviour that is independent<br />

of the types they operate on. Good candidates are <strong>for</strong> example container classes like vectors,<br />

matrices and lists. We can also extend the complex class with a parametric value type but we<br />

spent already so much time with it that we will now look at something else.<br />

Let us write a generic vector class. 4 First we just implement a class with the most fundamental<br />

operators:<br />

template <br />

class vector<br />

{<br />

void check size(int that size) const { assert(my size == that size); }<br />

void check index(int i) const { assert(i >= 0 && i < my size); }<br />

public:<br />

explicit vector(int size)<br />

: my size(size), data( new T[my size] )<br />

{}<br />

vector()<br />

: my size(0), data(0)<br />

{}<br />

4 In the sense of linear algebra not like STL vector.


96 CHAPTER 4. GENERIC PROGRAMMING<br />

vector( const vector& that )<br />

: my size(that.my size), data( new T[my size] )<br />

{<br />

<strong>for</strong> (int i= 0; i < my size; ++i)<br />

data[i]= that.data[i];<br />

}<br />

∼vector() { if (data) delete [] data ; }<br />

vector& operator=( const vector& that )<br />

{<br />

check size(that.my size);<br />

<strong>for</strong> (int i= 0; i < my size; ++i)<br />

data[i]= that.data[i];<br />

}<br />

int size() const { return my size ; }<br />

const T& operator[]( int i ) const<br />

{<br />

check index(i);<br />

return data[i];<br />

}<br />

T& operator[]( int i )<br />

{<br />

check index(i);<br />

return data[i] ;<br />

}<br />

vector operator+( const vector& that ) const<br />

{<br />

check size(that.my size);<br />

vector sum(my size);<br />

<strong>for</strong> (int i= 0; i < my size; ++i)<br />

sum[i]= data[i] + that[i];<br />

return sum ;<br />

}<br />

private:<br />

int my size ;<br />

T∗ data ;<br />

};<br />

Listing 4.1: Template vector class<br />

The template class is not essentially different to a non-template class. There is only the extra<br />

parameter T as placeholder <strong>for</strong> the type that the class is used with.<br />

We have member variables like my size and member functions size() that are not affected by<br />

the template parameter. Other functions like the access operator or the first constructor are<br />

parametrized. However the difference is minimal, whereever we had double (or another type)<br />

be<strong>for</strong>e we put now the type parameter T, e.g. <strong>for</strong> return types or within new. Likewise our<br />

member variables and constants can be parametrized by T as <strong>for</strong> data. Even program parts<br />

that use generic functions or data can be often implemented without explicitly stating the


4.4. CONCEPTS AND MODELING 97<br />

type parameters. For instance the destructor uses the pointer data with a template type but<br />

the delete function can deduce its type automatically and <strong>for</strong> the null pointer test it does not<br />

matter either.<br />

Template arguments can have default values. Assume, our vector class has in addition to the<br />

value type also two parameters <strong>for</strong> the orientation and location:<br />

template <br />

class vector;<br />

The arguments of a vector can be fully declared:<br />

vector v;<br />

The last argument is equal to the default value and can be omitted:<br />

vector v;<br />

As <strong>for</strong> functions, only the last arguments can be omitted. For instance, if the second argument<br />

is the default and the last one is not we must write them all:<br />

vector w;<br />

If all template arguments are the default values, we can of course omit them all. However the<br />

type is still a template class and the compiler gets confused if we skip the brackets in the type:<br />

vector x; // wrong, it is considered a non−template class<br />

vector y; // looks a bit strange but is correct<br />

Other than the defaults of function arguments, the template defaults can refer to previous<br />

template arguments:<br />

template <br />

class pair;<br />

This is a class <strong>for</strong> two values that might have different types. If they do not we do not want to<br />

declare it twice:<br />

pair p1; // object with an int and float value<br />

pair p2; // object with two int values<br />

The dependency on previous arguments can be more complex than just equality when using<br />

meta-functions that we will introduce in Chapter ??.<br />

TODO: transition to next section<br />

4.4 Concepts and Modeling<br />

In the previous sections one could get the impression that template parameters can be replaced<br />

by any type. This is in fact not entirely true. The programmer of templated classes and functions<br />

makes assumptions about the operations that can be per<strong>for</strong>med on the templated variables. So<br />

it is very important to know which types may correctly be substituted <strong>for</strong> the <strong>for</strong>mal template<br />

parameters, in C ++ lingo which types the template function or class can be instantiated with.<br />

Clearly, accumulate can be instantiated with int or double. Types without addition like a solver


98 CHAPTER 4. GENERIC PROGRAMMING<br />

class (on page 70) cannot be used <strong>for</strong> accumulate. What should be accumulated from a set of<br />

solvers? All the requirements <strong>for</strong> the template T of the function accumulate can be summarized<br />

as follows:<br />

• T is CopyConstructable;<br />

– Copy constructor T::T(const T&) exists so that ‘T a(b);’ is compilable if b is of type<br />

T.<br />

• T is PlusAssignable:<br />

– PlusAssign operator T::operator+=(const T&) exists so that ‘a+= b;’ is compilable if<br />

b is of type T.<br />

• T is Constructible from int<br />

– Constructor T::T(int) exists so that ‘T a(0);’ is compilable.<br />

Such a set of type requirements is called a ‘Concept’. A concept CR that contains all requirements<br />

of concept C and additional requirements is called a ‘Refinement’ of C. A type t that<br />

holds all requirements of concept C is called a ‘Model’ of C.<br />

A complete definition of a template function or type shall contain the list of required concepts<br />

like it is done <strong>for</strong> functions from the Standard Template Library, see http://www.sgi.com/<br />

tech/stl/.<br />

Today such requirements are mere documentation. A prototype of a C ++ Concept Compiler [?]<br />

that checks that<br />

• Whether a function can be called with a certain type (combination 5 );<br />

• Whether a class can be instantiated with a certain types (combination); and<br />

• Whether a function’s requirement list covers all used expressions, including those in subfunctions.<br />

The compiler generates short and comprehensive message when template functions or classes<br />

are used erroneously. People interested in generic programming shall try the compiler, it helps<br />

<strong>for</strong> better understanding. However, the compiler really is a prototype and must not be used in<br />

production code. This functionality was even planned <strong>for</strong> the next language standard but the<br />

committee could achieve a consensus on its details, to make a (very) long story short.<br />

Discussion 4.1 The most vulnerable aspect of generic programming is the semantic con<strong>for</strong>mance,<br />

that is which Semantic Concepts are modelled. For instance, an algorithm might require<br />

that a binary operation is associative to calculate correctly. One can express this requirement<br />

in the functions documentation but if someone calls this function with an operation that is not<br />

associative the compiler has no idea about this. If one violates a syntactical requirement than<br />

the compiler will complain about the missing function or operator — often in a hardly readable<br />

<strong>for</strong>m — but it will be caught no matter what. If one violates a semantic requirement the<br />

compiler generates erroneous executables and the compilation does not give not any warning<br />

because it is entirely unaware of the user types’ semantic. The only way to find such semantic<br />

errors in templates with today’s compilers is careful documentation (and its reading of course).<br />

Latest research gives hope that future C ++ standards and compilers will provide more reliable<br />

and elegant possibilities to ensure semantic correctness of template programs.<br />

5 If you have multiple template arguments


4.5. INHERITANCE OR GENERICS? 99<br />

For illustration purpose we like to show the conceptualized implementation of a generic sorting<br />

function as used in the library of the concept compiler:<br />

template<br />

requires LessThanComparable<br />

&& CopyAssignable<br />

&& Swappable<br />

&& CopyConstructible<br />

inline void sort(Iter first, Iter last);<br />

If the function is called erroneously, the compiler will detect this directly in the function call<br />

not deep inside its implementation.<br />

4.5 Inheritance or Generics?<br />

In this section we will discuss the commonalities and difference of/between object-oriented<br />

programming (OOP) and generic programming. People that do not know OOP so far will not<br />

learn it in this section. The purpose of this section is to motivate why we pay more attention<br />

to generic than to object-oriented programming in this book. The short answer is per<strong>for</strong>mance<br />

and applicability. If this answer is good enough <strong>for</strong> you, you can skip this section and continue<br />

with the next one. Programmers that are used to OOP and think they can implement the<br />

functionality with inheritance instead of templates should take the time and read this section.<br />

Inheritance and generic programming are similar in the sense that most programming problems<br />

that can be solved by inheritance have a generic alternative solution and vice versa. The<br />

following table summarizes the basic components of inheritance and the corresponding building<br />

blocks of generic programming:<br />

Inheritance Generic Programming<br />

base class concept<br />

derived class model<br />

In the remainder of this section we will discuss the differences between generic programming<br />

and inheritance.<br />

We will focus on functions, but similar arguments hold <strong>for</strong> templated classes. The advantage<br />

of using a base class reference or pointer as argument type of a function is that we are sure<br />

that all derived classes can be used as argument too, see § ??. 6 Inheritance in C ++ and other<br />

OOP languages is designed in a fashion that a function in a derived class can substitute (hide)<br />

the one in the base class with the identic signature. Thus calling the function <strong>for</strong> a base class<br />

argument will either use the base class’s implementation or those of the derived class (if the<br />

function is virtual). In both cases we can rely on the existance. We will explain OOP in more<br />

detail in Section ??. Here, we only name advantages and disadvantages of the two approaches<br />

regarding different aspects of programming.<br />

Compile time: With the OOP approach, function is only compiled once. The distinction<br />

between the different calculations is realized at run-time. The generic implementation requires<br />

6 TODO: OOP section is still not written yet.


100 CHAPTER 4. GENERIC PROGRAMMING<br />

a new compilation <strong>for</strong> each combination of types. As a consequence, the sources must reside in<br />

header files and cannot be stored to libraries. 7<br />

Executable size: As mentioned be<strong>for</strong>e, generic functions need multiple compilations and<br />

as a result of this, the generated executable contains code <strong>for</strong> each instatiation. A function<br />

programmed against an abstract interface exist only once. On the other hand, the virtual<br />

functions introduce some additional memory need to store the virtual function tables. Except<br />

<strong>for</strong> some pathological examples, one can expect that this additional space is less than the extra<br />

space needed <strong>for</strong> having separate machine code <strong>for</strong> every instantiation of a generic function. In<br />

extreme cases, a very large executable size can negatively impact the per<strong>for</strong>mance due to waste<br />

of cache memory.<br />

Per<strong>for</strong>mance: The higher compilation ef<strong>for</strong>ts <strong>for</strong> generic programming has a double per<strong>for</strong>mance<br />

benefit. Functions within the multi-functional computations do not need to be called<br />

indirectly via expensive function pointers but can be called directly. Whenever appropriate they<br />

can be even inlined saving the function call overhead entirely. We once measured the impact<br />

of the approaches to the per<strong>for</strong>mance of an accumulate function (a more general approach than<br />

in § 4.2.1) [?]. The generic version was in our case about 40 times faster than the inheritancebased<br />

implementation. This value varies from plat<strong>for</strong>m to plat<strong>for</strong>m but <strong>for</strong> small functions<br />

one can expect that an inlined template function is 10–100 times faster than virtual functions.<br />

Conversely, <strong>for</strong> long calculations like solving a large linear system the per<strong>for</strong>mance difference is<br />

unperceivable.<br />

Concept refinement: that is adding (syntactic) requirements is feasible with the inheritance<br />

approach but it is very tedious and obfuscates the program source, details in [?].<br />

Intrusiveness: The genericity emulation by inheritance can induce a deep class hierarchy [?],<br />

more critical <strong>for</strong> the universal applicability is that the technology is intrusive. A type cannot<br />

be used as argument of an OOP implementation if it is not derived from the according class<br />

even if the provides the correct interface! Thus, we have to add additional base class(es) to the<br />

type. This is particularly problematic if we use types from third-party libraries or intrinsic types<br />

because we cannot add base classes their. Generic functions have not such rigid constraints. We<br />

can even adapt a third-party or intrinsic type to meet a generic function’s syntactic requirements<br />

without modifying third-party programs.<br />

Time of selection: At least one advantage of the OOP-style polymorphism we should mention<br />

at the end. The argument type of generic function call must be known at compile time so that<br />

the compiler can instantiate the template function. The type of an OOP function argument can<br />

be chosen during the execution of the program and there<strong>for</strong>e depend on preceeding calculations<br />

or input data. For instance, one can define in a file which linear solver is used in an application.<br />

Résumé: It is not our goal to compare object-oriented and generic programming in general.<br />

The two approaches complete each other in many respects and this is beyond the scope of this<br />

discussion. However, when only considering the aspect of maximal applicability with optimal<br />

per<strong>for</strong>mance the generic approach is undoubtly superior. Especially if functions of a library are<br />

used with types defined outside this library, possibly necessary interface adaption is quite easy<br />

without modifying the type definition while the addition of extra base classes <strong>for</strong>ces changing<br />

the type definition what is not always possible (or desirable). In contexts where functions are<br />

used with limited numbers of types and they are defined in the same library, derivation can be<br />

7 Libraries in the classical sense that are linked with separately compiled sources as opposed to template<br />

libraries.


4.6. TEMPLATE SPECIALIZATION 101<br />

an appropriate technique to achieve polymorphism.<br />

4.6 Template Specialization<br />

Although one of the advantages of a generic implementation is that the same code can be used<br />

<strong>for</strong> all objects that satisfy the corresponding concept, this is not always the best approach.<br />

Sometimes the same behavior can be implemented more efficiently <strong>for</strong> a specific type. In<br />

principle, one can even implement a different behaviour <strong>for</strong> a specific type but this is not<br />

advisable in general because the program becomes much more complicated to understand and<br />

using the specialized classes can require a whole chain of further specialization (bearing the<br />

danger of errors when imcompletely realized). C ++ provides an enormous flexibility and the<br />

programmer is in charge to use this flexibility responsibly and <strong>for</strong> being consistent to himself.<br />

4.6.1 Specializing a Class <strong>for</strong> One Type<br />

In the following, we want to specialize our vector example from page 96 <strong>for</strong> bool. Our goal is<br />

to save memory by packing 8 bools into one byte. Let us start with the class definition:<br />

template <br />

class vector<br />

{<br />

// ..<br />

};<br />

Although our specialized class is not type-parametric, we still need the template key word and<br />

the empty triangle brackets. After the class the complete type list must be given. This syntax<br />

looks a bit cumbersome in this context but makes more sense <strong>for</strong> multiple template arguments<br />

where only some are specialized. For instance, if we had some container with 3 arguments and<br />

specialize the second one:<br />

template <br />

class some container<br />

{<br />

// ..<br />

};<br />

Back to our boolean vector class. In the class we define a default constructor <strong>for</strong> empty vectors,<br />

a constructor <strong>for</strong> vectors of size n and a destructor. In the size of the array, we have to pay<br />

some attention if the vector size is not disible by 8 because the integer division simply cuts off<br />

the remainder.<br />

template <br />

class vector<br />

{<br />

public:<br />

explicit vector(int size)<br />

: my size(size), data(new unsigned char[(my size + 7) / 8] )<br />

{}<br />

vector() : my size(0), data(0) {}


102 CHAPTER 4. GENERIC PROGRAMMING<br />

∼vector() { if (data) delete [] data ; }<br />

private:<br />

int my size;<br />

unsigned char∗ data ;<br />

};<br />

One thing we realize is that the default constructor and the destructor are identic with the<br />

non-specialized version (in the following also referred to as general version). Un<strong>for</strong>tunately, this<br />

is not ‘inherited’ to the specialization. If we write a specialization we have to define everything<br />

from scratch. We are free to omit member functions or variables from the general but <strong>for</strong> the<br />

sake of consistency we do this only <strong>for</strong> good reasons, <strong>for</strong> very good reasons. For instance, we<br />

might omit the operator+ because we have no addition <strong>for</strong> bool. The constant access operator<br />

is implemented with shifting and bit masking:<br />

template class vector<br />

{<br />

bool operator[](int i) const { return (data[i/8] ≫i%8) & 1; }<br />

};<br />

The mutable access is trickier because we cannot refer to single bits. The trick is to returns<br />

some helper type — called ‘Proxy’ — that can per<strong>for</strong>m the assignment and returning boolean<br />

from a byte reference and the position within the byte.<br />

template class vector<br />

{<br />

vector bool proxy operator[](int i)<br />

{<br />

return vector bool proxy(data[i/8], i%8);<br />

}<br />

};<br />

Let us now implement our proxy:<br />

class vector bool proxy<br />

{<br />

public:<br />

vector bool proxy(unsigned char& byte, int p) : byte(byte), mask(1 ≪ p) {}<br />

private:<br />

unsigned char& byte;<br />

unsigned char mask;<br />

};<br />

To simplify further operations we create a mask that has 1 on the position in question and 0<br />

on all other positions.<br />

The reading access is implemented by simply masking in the conversion operator:<br />

class vector bool proxy<br />

{<br />

operator bool() const { return byte & mask; }<br />

};<br />

Setting a bit is realized by an assignment operator <strong>for</strong> bool:


4.6. TEMPLATE SPECIALIZATION 103<br />

class vector bool proxy<br />

{<br />

vector bool proxy& operator=(bool b)<br />

{<br />

if (b)<br />

byte|= mask;<br />

else<br />

byte&= ∼mask;<br />

return ∗this;<br />

}<br />

};<br />

If our argument is true we ‘or’ it with the mask, i.e. on the considered position the one bit in<br />

the mask turns on the bit in the byte reference and in all other positions the zero bits in the<br />

mask leave the according positions unchanged. Reversely with a false argument, we first invert<br />

the mask and ‘and’ it with the byte reference so that the mask’s zero bit on the active position<br />

turns the bit off and on all other positions the ‘and’ with one bits conserves the old bit values. 8<br />

4.6.2 Specializing a Function to a Specific Type<br />

Functions can be specialized in the same manner as classes. Assume we have a generic function<br />

that computes the power x y and want specialize this one:<br />

template <br />

Base inline power(const Base& x, const Exponent);<br />

template <br />

double inline power(const double& x, const double& y); // Do not use this<br />

Un<strong>for</strong>tunately many of such specializations are ignored. There<strong>for</strong>e, we give the following<br />

Advise<br />

Do not use function template specialization!<br />

To specialize a function to one specific type or type tuple as above, we can simply use overloading.<br />

This works better and is even simpler. Back to our example, assume we have an entirely<br />

generic power method. 9 In the case that both arguments are double we want nevertheless use<br />

the standard implementation hoping that some caffeine-drugged geeks figured out an incredibly<br />

fast assembler hack <strong>for</strong> our plat<strong>for</strong>m and put it in our Linux distribution. Excited by the<br />

incredible per<strong>for</strong>mance — even if it is only the hope <strong>for</strong> it — we overload our power function<br />

as follows:<br />

#include <br />

template <br />

Base inline power(const Base& x, const Exponent)<br />

8 TODO: picture<br />

9 TODO: Anybody an idea <strong>for</strong> an implementation? Or a better example?


104 CHAPTER 4. GENERIC PROGRAMMING<br />

{<br />

}<br />

...<br />

double inline power(double x, double y)<br />

{<br />

return std::pow(x, y);<br />

}<br />

Speaking of plat<strong>for</strong>m-specific assembler hacks, maybe we are eager to contribute a code that<br />

explores SSE units by per<strong>for</strong>ming two computations in parallel:<br />

template <br />

Base inline power(const Base& x, const Exponent) { ... }<br />

#ifdef SSE FOR TRYPTICHON WQ OMICRON LXXXVI SUPPORTED<br />

std::pair inline power(const std::pair& x, double y)<br />

{<br />

asm {<br />

# Yo, I’m the greatestest geek under the sun!<br />

}<br />

return whatever;<br />

}<br />

#endif<br />

#ifdef ... more hacks ...<br />

What is to say about this snippet? If you do not like to write such specializations, we will<br />

not blame you. If you do, always put such hacks in conditional compilation. You have<br />

to make sure as well that your build system only enables the macro when it is definitely a<br />

plat<strong>for</strong>m that supports the hack. For the case that it does not, we must guarentee that the<br />

generic implementation or another overload can deal with pairs of double. Last but not least,<br />

you have to rewrite your applications <strong>for</strong> using this function. Convincing others to use such<br />

special implementation could be even more work than getting the assembler hack producing<br />

plausible numbers. More importantly, such special signatures undermines the ideal of a clear<br />

and intuitive programming. However, if power functions are computed on entire vectors and<br />

matrices, one could per<strong>for</strong>m the calculation pairwise internally without affecting the interface<br />

or the user application.<br />

You might also think that SSEs were yesterday and today we have GPUs and GPGPUs but<br />

programming generically still takes a lot of tricks (at least in the beginning of 2010). But this is<br />

another story and we digress. Resuming: programming <strong>for</strong> highest per<strong>for</strong>mance can be tricky<br />

but at least there often ways to explore unportable feature (where available) without sacrificing<br />

portability at the application level. 10<br />

In the previous examples, we specialized all arguments of the function. It is also possible to<br />

specialize some argument(s) and leave the remaining argument(s) as template(s):<br />

template <br />

Base inline power(const Base& x, const Exponent& y);<br />

template <br />

10 TODO: Is this comprehensible?


4.6. TEMPLATE SPECIALIZATION 105<br />

Base inline power(const Base& x, int y);<br />

template <br />

double inline power(double x, const Exponent& y);<br />

The compiler will find all overloads that match the argument combination and select the most<br />

specific. For instance, power(3.0, 2u) will match <strong>for</strong> the first and third overload where the latter<br />

is more specific. 11 To put it to higher math: 12 type specificity is a partial order that <strong>for</strong>ms a<br />

lattice and the compiler picks the maximum of the available overloads. However, you do not<br />

need to dive deeply into algebra to see which type or type combination is more specific.<br />

If we call power(3.0, 2) with the previous overloads all three matches. However, this time we<br />

cannot determine the most specific overload. The compiler will tell us that the call is ambiguous<br />

and show us overload 2 and 3 as candidates. As we implemented the overloads consisently and<br />

with optimal per<strong>for</strong>mance we might be glad with either choice but the compiler will not choose.<br />

To disambiguate the overloads we must add:<br />

double inline power(double x, int y);<br />

The lattice people from the previous paragraph will think “Of course, we were missing the join<br />

in the specificity order.” Again, one can understand C ++ without studying lattices.<br />

4.6.3 Partial Specialization<br />

If you implemented template classes you will run sooner or later in the situation where you like<br />

to specialize a template class <strong>for</strong> another template class. Suppose we have a templated complex<br />

class:<br />

template <br />

class complex;<br />

Assume further that we had some really boosting algorithmic specialization <strong>for</strong> complex vectors<br />

13 that safes tremendous compute time. Then we start specializing our vector class:<br />

template <br />

class vector;<br />

template <br />

class vector; // again ??? :−/<br />

template <br />

class vector; // how many more ??? :−P<br />

Apparently, this lacks elegance to reimplement the specialization <strong>for</strong> all possible and impossible<br />

instantiations of complex. Much worse, it destroys our ideal of universal applicability because<br />

the complex class is intended to support user-defined types as Real but the specialization of the<br />

vector class will be ignored <strong>for</strong> those types.<br />

The solution to the implementation redundancy and the ignorance of new types is ‘Partial<br />

Specialization’. We specialize our vector class <strong>for</strong> all complex instantiations:<br />

11 TODO: Exercises <strong>for</strong> which type is more specific than which.<br />

12 For those who like higher mathematics. And only <strong>for</strong> those.<br />

13 TODO: Anyone a good example?


106 CHAPTER 4. GENERIC PROGRAMMING<br />

template <br />

class vector<br />

{<br />

...<br />

};<br />

That will do the trick. Pay attention to put a space between the closing ‘¿’; otherwise the<br />

compiler will take two subsequent ‘¿’ as shift operator ‘¿¿’ and becomes pretty confused. 14<br />

This also works <strong>for</strong> classes with multiple parameters, <strong>for</strong> instance:<br />

template <br />

class vector<br />

{<br />

...<br />

};<br />

We can also specialize <strong>for</strong> all pointers:<br />

template <br />

class vector<br />

{<br />

...<br />

};<br />

Whenever the set of types is expressible by a Type Pattern we can apply partial specialization<br />

on it.<br />

Partial template specialization can be combined with regular template specialization from § 4.6.1<br />

— let us call it ‘Complete Specialization’ <strong>for</strong> distinction. In this case, the complete specialization<br />

is prioritized over the partial one. Between different partial specializations the most specific is<br />

selected. In the following example:<br />

template <br />

class vector<br />

{<br />

...<br />

};<br />

template <br />

class vector<br />

{<br />

...<br />

};<br />

the second specialization is more specific than the first one and picked when matches. In this<br />

sense a complete specialization is always more specific than a partial one.<br />

4.6.4 Partially Specializing Functions<br />

The C ++ standard committee distinguishes between explicit specialization as in the first paragraph<br />

of § 4.6.2 and implicit specialization. An example <strong>for</strong> implicit specialization is the following<br />

computation of a value’s magnitude:<br />

14 In the next (new depending on publication date) standard, closing ‘¿’ without intermediate spaces. Some<br />

compilers — e.g., VS 2008 already support the conglutinated notation today.


4.6. TEMPLATE SPECIALIZATION 107<br />

template <br />

T inline abs(const T& x)<br />

{<br />

return x < T(0) ? −x : x;<br />

}<br />

template // Do not specialize functions like this either<br />

T inline abs(const std::complex& x)<br />

{<br />

return sqrt(real(x)∗real(x) + imag(x)∗imag(x));<br />

}<br />

This works significantly better than the explicit specialization but even this <strong>for</strong>m of specialization<br />

fails sometimes in the sense that a template function is selected which is not the most<br />

specific. 15 A mean aspect of this implicit specialization is that it seems to work properly with<br />

few specializations and when a software project grows eventually it goes wrong. Since the<br />

developers have seen the specialization working be<strong>for</strong>e, they might not expect it and the unintended<br />

function selection might remain unobserved while corrupting results or at least wasting<br />

resources. It is also possible that the specialization behavior varies from compiler to compiler. 16<br />

The only conclusion from this is to not specializing function templates! It introduces an<br />

unnecessary fragility into our software. Instead we introduce an additional class (called functor<br />

§ 4.8) with an operator(). Template classes are properly specialized on all compilers 17 both<br />

partially and completely.<br />

In our abs example we start with the function itself and a <strong>for</strong>ward declaration of the template<br />

class:<br />

template struct abs functor;<br />

template <br />

typename abs functor::result type<br />

inline abs(const T& x)<br />

{<br />

abs functor functor object;<br />

return functor object(x);<br />

}<br />

Alternatively to the <strong>for</strong>ward declaration we could have declared the class directly. The return<br />

type of our function refers to a typedef or (as correct term in generic programming) to a<br />

‘Associated Type’ of abs functor. Already <strong>for</strong> complex numbers we do not return the argument<br />

type itself but its associated type value type. Using an associated type here gives us all possible<br />

flexibility <strong>for</strong> further specialization. For instance, the magnitude of a vector could be the sum<br />

or maximum of the elements’ magnitudes or a vector with the magnitudes of each element.<br />

Evidently the functor classes must define a result type to be called.<br />

Inside the function, we instantiate the functor class with the argument type: abs functor<br />

and create an object of this type. Then we call the object’s application operator. As we do not<br />

15 TODO: Good example.<br />

16 TODO: Ask a compiler expert about this.<br />

17 Several years ago many compilers failed in partial specialization, e.g. VS 2003, but today all major compiler<br />

handle this properly. If you nevertheless experience problems with this feature in some compiler take your hands<br />

off of it, most likely you will encounter further problems. Even the CUDA compiler that is far from being<br />

standard-compliant supports partial specialization.


108 CHAPTER 4. GENERIC PROGRAMMING<br />

really the object itself but only use it <strong>for</strong> the calculation, we can as well create an anonymous<br />

object and per<strong>for</strong>m the creation/construction and calcution in one expression:<br />

template <br />

typename abs functor::result type<br />

inline abs(const T& x)<br />

{<br />

return abs functor()(x);<br />

}<br />

In this expression we have two pairs of parentheses: the first one contains the arguments of the<br />

constructor, which are empty, and the arguments of the application operator, which is/are the<br />

argument(s) of the function. If would write:<br />

template <br />

typename abs functor::result type<br />

inline abs(const T& x)<br />

{<br />

return abs functor(x); // error<br />

}<br />

then x would be interpreted as argument of the constructor and an object of the functor class<br />

would be returned. 18<br />

Now we have to implement our functor classes:<br />

template <br />

struct abs functor<br />

{<br />

typedef T result type;<br />

};<br />

T operator()(const T& x)<br />

{<br />

return x < T(0) ? −x : x;<br />

}<br />

template <br />

struct abs functor<br />

{<br />

typedef T result type;<br />

};<br />

T operator()(const std::complex& x)<br />

{<br />

return sqrt(real(x)∗real(x) + imag(x)∗imag(x));<br />

}<br />

We wrote a general implementation that works <strong>for</strong> all fixed-point and floating-point types.<br />

18 Many years and versions ago, g++ tolerated this expression (sometimes) despite it is not standard-compliant.


4.7. NON-TYPE PARAMETERS FOR TEMPLATES 109<br />

4.7 Non-Type Parameters <strong>for</strong> Templates<br />

So far, we used template arguments only <strong>for</strong> types. Values can be template arguments as well.<br />

Not all values but only integral types, i.e. fixed point numbers and bool.<br />

Very popular is the definition of short vectors and small matrices with size arguments as template<br />

parameters, <strong>for</strong> instance:<br />

template <br />

class fsize vector<br />

{<br />

typedef fsize vector self;<br />

void check index(int i) const { assert(i >= 0 && i < my size); }<br />

public:<br />

typedef T value type;<br />

const static int my size= Size;<br />

fsize vector() {}<br />

fsize vector( const self& that )<br />

{<br />

<strong>for</strong> (int i= 0; i < my size; ++i)<br />

data[i]= that.data[i];<br />

}<br />

self& operator=( const self& that )<br />

{<br />

<strong>for</strong> (int i= 0; i < my size; ++i)<br />

data[i]= that.data[i];<br />

}<br />

int size() const { return my size ; }<br />

const T& operator[]( int i ) const<br />

{<br />

check index(i);<br />

return data[i];<br />

}<br />

T& operator[]( int i )<br />

{<br />

check index(i);<br />

return data[i] ;<br />

}<br />

self operator+( const self& that ) const<br />

{<br />

self sum;<br />

<strong>for</strong> (int i= 0; i < my size; ++i)<br />

sum[i]= data[i] + that[i];<br />

return sum ;<br />

}<br />

private:


110 CHAPTER 4. GENERIC PROGRAMMING<br />

};<br />

T data[Size] ;<br />

If you compare this implementation with the implementation in Section 4.3 on page 95 you<br />

realize that there not so many differences.<br />

The essential difference is that the size is now part of the type and that the compiler knows it.<br />

Let us start with the latter. The compiler can use its knowlegde <strong>for</strong> optimization. For instance,<br />

if we create a variable<br />

fsize vector v(w);<br />

the compiler can decide that the generated code <strong>for</strong> the copy constructor is not per<strong>for</strong>med in a<br />

loop but as a sequence of independent operations like:<br />

fsize vector( const self& that )<br />

{<br />

data[0]= that.data[0];<br />

data[1]= that.data[1];<br />

data[2]= that.data[2];<br />

}<br />

This saves the incrementation of the counter and the test <strong>for</strong> the loop end. In some sense,<br />

this test is already per<strong>for</strong>med at compile time. As a rule of thumb, the more is known during<br />

compilation the more potential <strong>for</strong> optimization exist. We will come back to this in more detail<br />

in Section 8.2 and Chapter ??.<br />

Which optimization is induced by additional compile-time in<strong>for</strong>mation is of course compilerdependent.<br />

One can only find out which trans<strong>for</strong>mation is actually done by reading the generated<br />

assembler code — what is not that easy, especially with high optimization and with low<br />

optimization the effect will probably not be there — or indirectly by observing per<strong>for</strong>mance and<br />

comparing it with other implementations. In the example above, the compiler will probably<br />

unroll the loop as shown <strong>for</strong> small sizes like 3 and keep the loop <strong>for</strong> larger sizes say 100. You<br />

see, why this compile-time sizes are particularly interesting <strong>for</strong> small matrices and vectors, e.g.<br />

three-dimensional coordinates or rotations.<br />

Another benefit of knowning the size at compile time is that we can store the values in an array<br />

and even inside the class. Then the values of temporary objects are stored on the stack and not<br />

on the heap. 19 The creation and destruction is much less expensive because only the change of<br />

the program counter at function begin and end needs to adapted to the objects size compared<br />

to dynamic memory allocation on the heap that involves the management of lists to keep track<br />

of allocated and free memory blocks. 20 To make a long story short, keeping the data in small<br />

arrays is much less expensive than dynamic allocation.<br />

We said that the size becomes part of the type. The careful reader might have realized that we<br />

omitted the checks whether the vectors have the same size. We do not need them anymore. If<br />

an argument has the class type, it implicitly has the same size. Consider the following program<br />

snippet:<br />

fsize vector v;<br />

fsize vector w;<br />

19 TODO: Picture.<br />

20 TODO: Need easier or longer explication. or citation.


4.8. FUNCTORS 111<br />

vector x(3), y(4);<br />

v= w;<br />

x= y;<br />

The last two lines are incompatible vector assignments. The difference is that the imcompatibility<br />

in the second assignment x= y; is discovered at run time in our assertion. The assignment<br />

v= w; does not even compile because fixed-size vectors of dimension 3 only accept vectors of the<br />

same dimension as argument.<br />

Like type arguments, non-type template arguments can have defaults. Say the most frequent<br />

dimension of our vectors is three because we live in a three-dimensional world, relativity and<br />

string theory aside. Then we save some typing with a default:<br />

template <br />

class fsize vector<br />

{ /∗ ... ∗/ };<br />

fsize vector v, w, x, y;<br />

fsize vector space time;<br />

fsize vector string;<br />

4.8 Functors<br />

Let us develop a mathematical algorithm <strong>for</strong> computing the finite difference of a differentiable<br />

function f. The finite difference is an approximation of the first derivative by<br />

f ′ (x) ≈<br />

where h is a small value also called spacing.<br />

f(x + h) − f(x)<br />

h<br />

A general function <strong>for</strong> computing the finite difference is presented here:<br />

#include <br />

#include <br />

// Function taking a function argument<br />

double finite difference( double f( double ), double x, double h ) {<br />

return ( f(x+h) − f(x) ) /h ;<br />

}<br />

double sin plus cos( double x ) {<br />

return sin(x) + cos(x) ;<br />

}<br />

int main() {<br />

std::cout ≪ finite difference( sin plus cos, 1., 0.001 ) ≪ std::endl ;<br />

std::cout ≪ finite difference( sin plus cos, 0., 0.001 ) ≪ std::endl ;<br />

}


112 CHAPTER 4. GENERIC PROGRAMMING<br />

Note that the function finite difference takes an arbitrary function (from double to double) as<br />

argument.<br />

Now suppose we want to compute the second order derivative. It would make sense to call<br />

finite difference with finite difference as argument. Un<strong>for</strong>tunately this is not possible since we have<br />

three arguments in this function and the first argument of finite difference only accepts a function<br />

with a single argument.<br />

For this reason, we can use ‘functors’. Functors — not to confuse with functors from category<br />

theory — are either functions or objects of classes providing operator(). This means that<br />

‘functors’ are things which can be called liked functions but are not necessarily functions.<br />

Using objects of a class providing operator() has the additional advantage that it can use an<br />

internal state in terms of member variables. 21<br />

For our example, the functor could be implemented as follows:<br />

struct sin plus cos<br />

{<br />

double operator() (double x) const<br />

{<br />

return sin(x) + cos(x) ;<br />

}<br />

};<br />

but we could also consider a functor with an parameter like this:<br />

class para sin plus cos<br />

{<br />

public:<br />

para sin plus cos(double parameter) : parameter(parameter) {}<br />

double operator() (double x) const<br />

{<br />

return sin(parameter ∗ x) + cos(x) ;<br />

}<br />

private:<br />

double parameter;<br />

} ;<br />

How can we use the functor in a function? We want to be able to pass objects of both sin plus cos<br />

and para sin plus cos to our finite difference function. There are two possible solutions: inheritance<br />

and generic programming, which we now discuss.<br />

4.8.1 Functors via inheritance<br />

TODO: Better as counter-example in OO chapter. We haven’t introduced virtual functions<br />

yet.<br />

Let us first rewrite our function finite difference using an abstract base class.<br />

21 TODO: Do we want the following sentences?: Functors can encapsulate C and C ++ function pointers<br />

employing the concepts templates and polymorphism. All the functions must have the same return-type and<br />

calling parameters.


4.8. FUNCTORS 113<br />

struct functor base<br />

{<br />

virtual double operator() (double x) const= 0 ;<br />

} ;<br />

double finite difference( functor base const& f, double x, double h )<br />

{<br />

return ( f(x+h) − f(x) ) /h ;<br />

}<br />

The functor class has a pure 22 virtual function operator() and thus can not be used. We can<br />

however alter the functor para sin plus cos such that it inherits from the abstract base class and<br />

specializes operator().<br />

class para sin plus cos<br />

: public functor base<br />

{<br />

public:<br />

para sin plus cos(double p) : parameter(p) {}<br />

double operator() (double x) const // Is virtual function in base<br />

{<br />

return sin( parameter ∗ x ) + cos(x);<br />

}<br />

private:<br />

double parameter;<br />

};<br />

Now we can use an object of this class as the first argument of finite difference.<br />

The whole program looks as follows:<br />

#include <br />

#include <br />

struct functor base {<br />

virtual double operator() ( double x ) const = 0 ;<br />

} ;<br />

double finite difference( functor base const& f, double x, double h ) {<br />

return ( f(x+h) − f(x) ) /h ;<br />

}<br />

class para sin plus cos<br />

: public functor base<br />

{<br />

public:<br />

para sin plus cos( double const& p )<br />

: parameter ( p )<br />

{}<br />

double operator() ( double x ) const { // Virtual function<br />

return sin( parameter ∗ x )+ cos(x) ;<br />

22 TODO: undefined


114 CHAPTER 4. GENERIC PROGRAMMING<br />

}<br />

private:<br />

double parameter ;<br />

} ;<br />

int main() {<br />

para sin plus cos sin 1( 1.0 ) ;<br />

std::cout ≪ finite difference( sin 1, 1., 0.001 ) ≪ std::endl ;<br />

std::cout ≪ finite difference( para sin plus cos(2.0), 1., 0.001 ) ≪ std::endl ;<br />

std::cout ≪ finite difference( para sin plus cos(2.0), 0., 0.001 ) ≪ std::endl ;<br />

}<br />

4.8.2 Functors via generic programming<br />

If we make the functor argument in finite difference generic, we do not need a functor base<br />

any longer. There is also no need to alter our previously defined functors sin plus cos and<br />

para sin plus cos. This is a perfect example of the fact that using generic programming makes<br />

extending software easier. The program now looks like:<br />

#include <br />

#include <br />

template <br />

T inline finite difference(F const& f, const T& x, const T& h)<br />

{<br />

return ( f(x+h) − f(x) ) / h ;<br />

}<br />

class para sin plus cos<br />

{<br />

public:<br />

para sin plus cos(double p) : parameter(p) {}<br />

double operator() ( double x ) const<br />

{<br />

return sin( parameter ∗ x ) + cos(x);<br />

}<br />

private:<br />

double parameter;<br />

};<br />

int main()<br />

{<br />

para sin plus cos sin 1( 1.0 ) ;<br />

std::cout ≪ finite difference( sin 1, 1., 0.001 ) ≪ std::endl ;<br />

std::cout ≪ finite difference( para sin plus cos(2.0), 1., 0.001 ) ≪ std::endl ;<br />

std::cout ≪ finite difference( para sin plus cos(2.0), 0., 0.001 ) ≪ std::endl ;<br />

}<br />

return 0;


4.8. FUNCTORS 115<br />

Since we are using a template argument F we need to define the constraints that it has to satisfy.<br />

For this function, we need F to be a functor with one argument. This is called a UnaryFunctor.<br />

Formally, we can write this as follows:<br />

• Let f be of type F.<br />

• Let x be of type X, where X is the argument type of F.<br />

• f(x) calls f with one argument, and returns an object of the result type.<br />

In this example we also require that the argument type and result type of F are identical. We<br />

can remove this restriction if we establish a unique way to deduce the return type. This can be<br />

achieved by meta-programming or with the type deduction in the next C ++ standard.<br />

So far so good. We complained be<strong>for</strong>e that we cannot apply the finite differences on themselves<br />

to compute higher order derivatives. Actually, we still cannot. The problem is that the<br />

finite difference expects (amongst others) a unary functor and is itself a ternary function. So it<br />

cannot use itself as argument. The solution is to realize its functionality in a unary functor that<br />

we call derivative:<br />

template <br />

class derivative<br />

{<br />

public:<br />

derivative(const F& f, const T& h) : f(f), h(h) {}<br />

T operator()(const T& x) const<br />

{<br />

return ( f(x+h) − f(x) ) / h ;<br />

}<br />

private:<br />

const F& f;<br />

T h;<br />

};<br />

Now we can create an object that approximates the derivative from f(x) = sin(1 · x) + cos x:<br />

typedef derivative spc der 1;<br />

spc der 1 spc(sin 1, 0.001);<br />

The object spc can be used like a function and it approximates f ′ (x). In addition it is a unary<br />

functor. That means we can compute its derivative:<br />

typedef derivative spc der 2;<br />

spc der 2 spc scd(spc, 0.001);<br />

std::cout ≪ ”Second derivative of sin(0) + cos(0) is ” ≪ spc scd(0.0) ≪ ’\n’;<br />

The object spc scd is again a unary functor and aproximates f ′′ (x). We could again construct<br />

a functor <strong>for</strong> its derivative and continue this game eternally.<br />

Assume that we need second derivatives from different functions. Then it becomes annoying to<br />

define first the type of the first derivative constructing a functor from it <strong>for</strong> finally creating a<br />

functor the second one. According to Greg Wilson’s [?] 23 maxim “Whatever you use twice,<br />

automate!” we write a class that provides us the second derivative directly:<br />

23 This online course contains a gigantic collection of tips how to develop software successfully and avoid<br />

frustrating unproductivity. We highly recommend you reading this material.


116 CHAPTER 4. GENERIC PROGRAMMING<br />

template <br />

class second derivative<br />

{<br />

public:<br />

second derivative(const F& f, const T& h) : h(h), fp(f, h) {}<br />

T operator()(const T& x) const<br />

{<br />

return ( fp(x+h) − fp(x) ) / h ;<br />

}<br />

private:<br />

T h;<br />

derivative fp;<br />

};<br />

Now we can build the f ′′ functor from f:<br />

second derivative spc scd2(para sin plus cos(1.0), 0.001);<br />

When we think about how we would implement the third, fourth or in general the n-th derivative,<br />

we realize that they would look much like the second one: calling the (n-1)-th derivative on x+h<br />

and x. We can explore this with a recursive implementation:<br />

template <br />

class nth derivative<br />

{<br />

typedef nth derivative prec derivative;<br />

public:<br />

nth derivative(const F& f, const T& h) : h(h), fp(f, h) {}<br />

T operator()(const T& x) const<br />

{<br />

return ( fp(x+h) − fp(x) ) / h ;<br />

}<br />

private:<br />

T h;<br />

prec derivative fp;<br />

};<br />

To save the compiler from infinite recursion we must stop this mutual referring when we reach<br />

the first derivative. Note that we cannot use ‘if’ or ‘?:’ to stop the recursion because both of its<br />

respective branches are evaluated and one of them still contains the infinite recursion. Recursive<br />

template definitions are terminated with a specialization like this:<br />

template <br />

class nth derivative<br />

{<br />

public:<br />

nth derivative(const F& f, const T& h) : f(f), h(h) {}<br />

T operator()(const T& x) const<br />

{<br />

return ( f(x+h) − f(x) ) / h ;<br />

}<br />

private:


4.8. FUNCTORS 117<br />

};<br />

const F& f;<br />

T h;<br />

This specialization is identical with the class derivative that we now could throw away. If we keep<br />

it, we can at least reuse its functionality and variables to reduce redundancy. This is achieved<br />

by derivation (more in Chapter 6).<br />

template <br />

class nth derivative<br />

: public derivative<br />

{<br />

public:<br />

nth derivative(const F& f, const T& h) : derivative(f, h) {}<br />

};<br />

With our recursive definition we can easily define the twenty-second derivative:<br />

nth derivative spc 22(para sin plus cos(1.0), 0.00001);<br />

The new object spc 22 is again a unary functor. Un<strong>for</strong>tunately, it approximates so badly that<br />

we are too ashamed to present the results here. From Taylor series we know that the error of<br />

the f ′′ approximation is reduced from O(h) to O(h 2 ) when a backward difference is applied<br />

on the <strong>for</strong>ward difference. This said, maybe we can improve our approximation if we alternate<br />

between <strong>for</strong>ward and backward differences:<br />

template <br />

class nth derivative<br />

{<br />

typedef nth derivative prec derivative;<br />

public:<br />

nth derivative(const F& f, const T& h) : h(h), fp(f, h) {}<br />

T operator()(const T& x) const<br />

{<br />

return N & 1 ? ( fp(x+h) − fp(x) ) / h<br />

: ( fp(x) − fp(x−h) ) / h ;<br />

}<br />

private:<br />

T h;<br />

prec derivative fp;<br />

};<br />

Sadly, our 22nd derivative is still as wrong as be<strong>for</strong>e, well slightly worse. Which is particularly<br />

frustrating when we become aware that we evaluate f over four million times. 24 Decreasing h<br />

does not help either: the tangent approaches better the derivative but on the other hand the<br />

values of f(x) and f(x ± h) become quite close and their difference has only few meaningful<br />

bits. At least the second derivative improved by our alternating difference scheme as the Taylor<br />

series teach us. Another consolidating fact is that we probably did not pay <strong>for</strong> the alteration.<br />

The template argument N is known at compile time and the condition N&1 whether the last<br />

bit is on can be also evaluated during compilation. When N is odd than the operator reduces<br />

effectively to:<br />

24 TODO: Is there an efficient and well-approximating recursive scheme to compute higher order derivatives?


118 CHAPTER 4. GENERIC PROGRAMMING<br />

T operator()(const T& x) const<br />

{<br />

return ( fp(x+h) − fp(x) ) / h ;<br />

}<br />

Likewise <strong>for</strong> even N, only the backward difference is computed without testing.<br />

If nothing else we learned something about C ++ and we are confirmed in the<br />

Truism<br />

Not even the coolest programming can substitute <strong>for</strong> solid mathematics.<br />

In the end, this script is primarily about programming. To improve the expressiveness of our<br />

software, functors are an extremely powerful approach. We have seen how to take an arbitrary<br />

unary function and construct a unary function that approximates its derivative or a higher-order<br />

derivative.<br />

If we do not know the type of a function or we do not like to bother with it we can write a<br />

convenience function that detects the type automatically:<br />

template <br />

nth derivative<br />

inline make nth derivative(const F& f, const T& h)<br />

{<br />

return nth derivative(f, h);<br />

}<br />

Here F and T are types of function arguments and can be detected by the compiler. The only<br />

template argument that the compiler does not detect is N. Note that such arguments must be<br />

at the beginning of the template argument list and the compiler-detected at the end. There<strong>for</strong>e<br />

the following template function is wrong:<br />

template // error<br />

nth derivative<br />

inline make nth derivative(const F& f, const T& h)<br />

{<br />

return nth derivative(f, h);<br />

}<br />

If you call this one, the compiler will complain that it cannot detect N. This leads us to the<br />

question how we call this function. Of course, we can explicitly declare all argument types:<br />

make nth derivative(sin 1, 0.00001);<br />

But this is exactly what we wanted to avoid with implementing this function. As said, F and T<br />

can be detected by the compiler and we only need to provide N:<br />

make nth derivative(sin 1, 0.00001);<br />

What is this expression good <strong>for</strong>? Written like this, not much. It creates a function that will<br />

be immediately destroyed. If it is a function we should be able to call it with an argument:


4.8. FUNCTORS 119<br />

std::cout ≪ ”Seventh derivative of sin 1 at x=3 is ”<br />

≪ make nth derivative(sin 1, 0.00001)(3.0) ≪ ’\n’;<br />

In the cases above the type of the functor was obvious because we wrote the class ourselves. The<br />

type is less obvious if the type is constructed from an expression, <strong>for</strong> instance by a λ-function.<br />

Support <strong>for</strong> λ-functions will be introduced with C ++0x. 25 Emulation is available since some<br />

years with Boost.Lambda [?]. For instance, we can generate a functor object that computes<br />

with the following short expression:<br />

(3.5 ∗ 1 + 4.0) ∗ 1 ∗ 1;<br />

p(x) = 3.5x 3 + 4x 2 = (3.5x + 4)x 2<br />

This expression can be used with our derivative function:<br />

make nth derivative((3.5 ∗ 1 + 4.0) ∗ 1 ∗ 1, 0.0001)<br />

to generate a functor computing (approximating) 21x + 8.<br />

With the lambda expressions, we do not even know the type of our functor but we can compute<br />

its derivative. The type is in fact so long 26 that it is much easier to implement our own functor<br />

when we were obliged to spell the type out.<br />

The following listing illustrates how to approximate p ′′ (2):<br />

#include <br />

// .. our definitions of derivatives<br />

int main()<br />

{<br />

using boost::lambda:: 1;<br />

std::cout ≪ ”Second derivative of 3.5∗xˆ3+4∗xˆ2 at x=2 is ”<br />

≪ make nth derivative((3.5 ∗ 1 + 4.0) ∗ 1 ∗ 1, 0.0001)(2) ≪ ’\n’;<br />

return 0;<br />

}<br />

Un<strong>for</strong>tunately, we cannot keep the results of our computations if we do not know their types<br />

with current standard C ++. In C ++0x, we will be able to let the compiler deduce the type:<br />

auto p= (3.5 ∗ 1 + 4.0) ∗ 1 ∗ 1; // With <strong>C++</strong>0x<br />

auto p2= make nth derivative(p, 0.0001);<br />

Once defined, we can reuse p and p2 as often as we want. Of course, calculating the derivatives<br />

of polynomials can be done better than with differential quotients. We will discuss this in<br />

Section 8.2.<br />

25 TODO: Try in g++ 4.3 and 4.4?<br />

26 boost::lambda::lambda functor


120 CHAPTER 4. GENERIC PROGRAMMING<br />

4.8.3 The function accumulate with a functor argument<br />

TODO: Again, I don’t like the use of pointers here — Peter<br />

Recall the function accumulate from section 4.2.1 that we used to introduce Generic Programming.<br />

In this section, we will generalize this function. We introduce a binary functor (concept<br />

BinaryFunctor) that implements an operation on two arguments as function or callable class<br />

object. 27 Then we can accumulate values with respect to this binary operation:<br />

template <br />

T accumulate( T∗ a, T∗ a end, T init, BinaryFunctor op ) {<br />

T sum( init ) ;<br />

<strong>for</strong> ( ; a!=a end; ++a ) {<br />

sum = op( sum, ∗a ) ;<br />

}<br />

return sum ;<br />

}<br />

The concept BinaryFunctor is defined as follows: 28<br />

• Let op be of type BinaryFunctor.<br />

– has the method op( first argument type, second argument type ) with result type being<br />

convertible to T. T should be convertible to the first and second argument types.<br />

From this generic example, it is quite clear that the conceptual conditions are becoming complicated<br />

when we are mixing types. Usually, we make sure that the first argument type, second<br />

argument type and result type are the same, but strictly speaking, this is not required, since<br />

the compiler is allowed to per<strong>for</strong>m conversions.<br />

The main program could be as follows:<br />

struct sum functor<br />

{<br />

double operator() ( double a, double b ) const {<br />

return a + b ;<br />

}<br />

} ;<br />

struct product functor<br />

{<br />

double operator() ( double a, double b ) const {<br />

return a ∗ b ;<br />

}<br />

} ;<br />

int main()<br />

{<br />

int n=10;<br />

double a[n] ;<br />

double s = accumulate( a, a+n, 0.0, sum functor() ) ;<br />

s = accumulate( a, a+n, 1.0, product functor() ) ;<br />

}<br />

27 TODO: Introduce term.<br />

28 TODO: revisit


4.9. STL — THE MOTHER OF ALL GENERIC LIBRARIES 121<br />

4.9 STL — The Mother of All Generic Libraries<br />

The Standard Template Library — STL — is an example of a generic C ++ library. It defines<br />

generic container classes, generic algorithms, and iterators. Online documentation is provided<br />

under www.sgi.com/tech/stl. There are also entire books written about the usage of STL so<br />

that we can keep it short here and refer to these books [?].<br />

4.9.1 Introducing Example<br />

Containers are classes whose purpose is to contain other objects. The classes vector and list are<br />

examples of STL container classes. Each of these classes is templated, and can be instantiated<br />

to contain any type of object (that is a model of the appropriate concept). For example, the<br />

following lines create a vector containing doubles and another one containing integers:<br />

std::vector vec d ;<br />

std::vector vec i ;<br />

The STL also includes a large collection of algorithms that manipulate the data stored in<br />

containers. The accumulate algorithm, <strong>for</strong> example, can be used to compute any reduction —<br />

such as sum, product, or minimum — on a list or vector in the following way:<br />

std::vector vec ; // fill the vector...<br />

std::list lst ; // fill the list...<br />

double vec sum = std::accumulate( vec.begin(), vec.end(), 0.0 ) ;<br />

double lst sum = std::accumulate( lst.begin(), lst.end(), 0.0 ) ;<br />

Notice the use of the functions begin() and end(), that denote the begin and end of the vector<br />

and the list represented by ‘Iterators’. Iterators are the central concept of the STL and we will<br />

have a closer look at it.<br />

4.9.2 Iterators<br />

Disrespectfully spoken, an iterator is a generalized pointer: one can dereference it and change the<br />

referred location. This over-simplified view is not doing justice to its importance. Iterators are a<br />

Fundamental Methodology to Decouple the Implementation of Data Structures and Algorithms.<br />

Figure 4.2 29 depicts this central role of iterators. Every data structure provides an iterator <strong>for</strong><br />

traversing it and all algorithms are implemented in terms of iterators.<br />

To program m algorithms on n data structures, one needs in classical C and Fortran programming<br />

m · n implementations.<br />

Expressing algorithms in terms of iterators decreases this to only<br />

m + n implementations!<br />

29 TODO: Flatter boxes and more containers and algos, maybe.


122 CHAPTER 4. GENERIC PROGRAMMING<br />

Data Structures Algorithms<br />

vector<br />

set<br />

map<br />

queue<br />

: :<br />

Iterators<br />

Figure 4.2: Central role of iterators in STL<br />

copy<br />

search<br />

replace<br />

Evidently, not all algorithms can be implemented on every data structure. Which algorithm<br />

works on a given data structure depends on the kind of iterator provided by the container.<br />

Iterators can be distinguished by the <strong>for</strong>m of access:<br />

InputIterator: an iterator concept <strong>for</strong> reading the referred entries.<br />

OutputIterator: an iterator concept <strong>for</strong> writing to the referred entries.<br />

Note that the ability to write does not imply readability, e.g., an ostream iterator is an STL<br />

interface used to write to output streams like files opened in write mode. Another differentiation<br />

of iterators is the <strong>for</strong>m of traversal:<br />

ForwardIterator: a concept <strong>for</strong> iterators that can pass from one element to the next, i.e. types<br />

that provide an operator++. It is a refinement of InputIterator and OutputIterator. In contrast<br />

to those, ForwardIterator they allows <strong>for</strong> traversing multiple times.<br />

BidirectionalIterator: a concept <strong>for</strong> iterators with step-wise <strong>for</strong>ward and backward traversal,<br />

i.e. types with operator++ and operator−−. It refines ForwardIterator.<br />

RandomAccessIterator: a concept <strong>for</strong> iterators that can increment their position by an arbitrary<br />

integer, i.e. types that also provide operator[]. It refines BidirectionalIterator.<br />

Data structures that provide more refined iterators (e.g. modeling RandomAccessIterator) can be<br />

used in more algorithms. Dually, algorithm implementations that require less refined iterators<br />

(like InputIterator) can be applied to more data structures. The interfaces are designed with<br />

backward compatibility in mind and old-style pointers can be used as iterators.<br />

All standard container templates provide a rich and consistent set of iterator types. The<br />

following very simple example shows a typical use of iterators:<br />

std::list l ;<br />

<strong>for</strong> (std::list::const iterator it = l.begin(); it != l.end(); ++it) {<br />

std::cout ≪ ∗it ≪ std::endl;<br />

}<br />

sort<br />

: :


4.10. CURSORS AND PROPERTY MAPS 123<br />

As illustrated above, iterators are usually used in pairs, where one is used <strong>for</strong> the actual iteration<br />

and the second serves to mark the end of the collection. The iterators are created by the<br />

corresponding container class using standard methods such as begin() and end(). The iterator<br />

returned by begin() points to the first element. The iterator returned by end() points past the<br />

end of elements to mark the end. All algorithms are implemented with right-open intervals<br />

[b, e) operating on the value referred by b until b = e. There<strong>for</strong>e intervals of the <strong>for</strong>m [x, x) are<br />

regarded empty.<br />

A more general (and more useful) algorithm is the linear search on an arbitrary sequence. This<br />

is provided by the STL function find in the following fashion:<br />

template <br />

InputIterator find(InputIterator first, InputIterator last, const T& value) {<br />

while (first != last && ∗first != value)<br />

++first;<br />

return first;<br />

}<br />

find takes three arguments: two iterators that define the right-open interval of the search space,<br />

and a value to search <strong>for</strong> in that range. Each entry referred by ‘first’ is compared with ‘value’.<br />

When a match is found, the iterator pointing to it is returned. If the value is not contained<br />

in the sequence, an iterator equal to ‘last’ is returned. Thus, the caller can test whether the<br />

search was successful by comparing its result with ‘last’. In fact, one must per<strong>for</strong>m this test<br />

because after a failed search the returned iterator cannot dereferred correctly (it points outside<br />

the given range and might cause segmentation violations or corrupt data).<br />

This section only scratched the surface of STL and was primarily intended to introduce the<br />

iterator concept that we will generalize in the following section.<br />

4.10 Cursors and Property Maps<br />

The essential idea of iterators is to represent a position and a referred value. A further generalization<br />

of this idea is to decouple the the notion of position and value. Dietmar Kühl<br />

proposed this mechanism in his master thesis (Diplomarbeit) [?] <strong>for</strong> the generic treatment of<br />

grahps. The Boost Graph Library [?] provides the notion of property maps in the <strong>for</strong>m that<br />

properties are available <strong>for</strong> vertices and edges and all properties can be accessed independently<br />

from each other and from the traversal of the graph.<br />

As case study we implement a simple sparse matrix class with cursors and property maps. The<br />

minimalistic implementation of the sparse matrix is:<br />

#include <br />

#include <br />

#include <br />

#include <br />

template <br />

class coo matrix<br />

{<br />

typedef Value value type; // better in trait<br />

public:<br />

coo matrix(int nr, int nc) : nr(nr), nc(nc) {}


124 CHAPTER 4. GENERIC PROGRAMMING<br />

void insert(int r, int c, Value v)<br />

{<br />

assert(r < nr && c < nc);<br />

row index.push back(r);<br />

col index.push back(c);<br />

data.push back(v);<br />

}<br />

void sort() {}<br />

int nnz() const { return row index.size(); }<br />

int num rows() const { return nr; }<br />

int num cols() const { return nc; }<br />

int begin row(int r) const<br />

{<br />

unsigned i= 0;<br />

while (i < row index.size() && row index[i] < r) ++i;<br />

return i;<br />

}<br />

template friend struct coo col;<br />

template friend struct coo row;<br />

template friend struct coo const value;<br />

template friend struct coo value;<br />

private:<br />

int nr, nc;<br />

std::vector row index, col index;<br />

std::vector data;<br />

};<br />

The matrix is supposed to be sorted lexicographically (although we omitted the implementation<br />

of the sort function <strong>for</strong> the sake of brevity). For any offset i the i th entry in each of the vectors<br />

row index, row index and data represent row, column and value of one non-zero entry in the matrix.<br />

The traversal over all non-zeros of the matrix can be realized with a cursor that contains just<br />

this offset.<br />

struct nz cursor<br />

{<br />

typedef int key type;<br />

nz cursor(int offset) : offset(offset) {}<br />

nz cursor& operator++() { offset++; return ∗this; }<br />

nz cursor operator++(int) { nz cursor tmp(∗this); offset++; return tmp; }<br />

key type operator∗() const { return offset; }<br />

bool operator!=(const nz cursor& other) { return offset != other.offset; }<br />

protected:<br />

int offset;<br />

};


4.10. CURSORS AND PROPERTY MAPS 125<br />

The cursor is initialized with one offset. Many cursor classes will keep a reference to the traversed<br />

matrix object but we do not need this here. The cursor can be incremented, compared, and<br />

dereferred. The result of the dereferentiation is a ‘key’. For simplicity we used an int as key<br />

type.<br />

Like the begin and end functions in STL we define:<br />

template <br />

nz cursor nz begin(const Matrix& A)<br />

{<br />

return nz cursor(0);<br />

}<br />

template <br />

nz cursor nz end(const Matrix& A)<br />

{<br />

return nz cursor(A.nnz());<br />

}<br />

the function nz begin that returns a cursor on the first non-zero entry and nz end which gives a<br />

past-the-end cursor to terminate the traversal<br />

A key can be used as argument <strong>for</strong> a property map that we will define now:<br />

template <br />

struct coo col<br />

{<br />

typedef int key type;<br />

coo col(const Matrix& ref) : ref(ref) {}<br />

int operator()(key type k) const { return ref.col index[k]; }<br />

private:<br />

const Matrix& ref;<br />

};<br />

Property maps have typically a reference to the matrix in order to read internal data from it.<br />

They are often declared as friends because they are an important tool to access the object’s<br />

internal data — it might even be the only way to access data as in the Boost Graph Library.<br />

Property maps to read the row index or the value fo the offset key are equivalent and there<strong>for</strong>e<br />

omitted here.<br />

A property map <strong>for</strong> mutable entries is implemented as follows:<br />

template <br />

struct coo value<br />

{<br />

typedef int key type;<br />

typedef typename Matrix::value type value type;<br />

coo value(Matrix& ref) : ref(ref) {}<br />

value type operator()(key type k) const { return ref.data[k]; }<br />

void operator()(key type k, const value type& v) { ref.data[k]= v; }


126 CHAPTER 4. GENERIC PROGRAMMING<br />

private:<br />

Matrix& ref;<br />

};<br />

In contrast to the previous maps it contains a mutable reference and another operator <strong>for</strong> setting<br />

a value.<br />

To test our implementation we create matrix A:<br />

coo matrix A(3, 5);<br />

A.insert(0, 0, 2.3);<br />

A.insert(0, 3, 3.4);<br />

A.insert(1, 2, 4.5);<br />

and define the three property maps:<br />

coo col col(A);<br />

coo row row(A);<br />

coo value value(A);<br />

A read-only traversal of all non-zero entries reads:<br />

<strong>for</strong> (nz cursor c= nz begin(A), end= nz end(A); c != end; ++c)<br />

std::cout ≪ ”A[” ≪ row(∗c) ≪ ”][” ≪ col(∗c) ≪ ”] = ” ≪ value(∗c) ≪ ”\n”;<br />

Scaling all non-zero elements can be achieved similarly:<br />

<strong>for</strong> (nz cursor c= nz begin(A), end= nz end(A); c != end; ++c)<br />

value(∗c, 2.0 ∗ value(∗c));<br />

Note that we did not used all property maps in the last algorithm. In fact, this is one of the<br />

motivation <strong>for</strong> property maps. Only data really needed in the algorithm must be provided. In<br />

today’s computer landscape, this can make a significant difference in per<strong>for</strong>mance since reading<br />

and writing data is much more time-consuming than most numeric computations. Or if data is<br />

only available implicitly and needs recomputation.<br />

Another advantage of this approach is the easier realization of nested traversals. Say we have<br />

an algorithm that iterates over rows and within each row over the non-zero entries. In this case,<br />

we need other cursor type(s) but can reuse the property maps — if our new cursor derefer to<br />

the same key type. First we need a cursor to iterate over all rows of a matrix:<br />

template <br />

struct row cursor<br />

{<br />

row cursor(int r, const Matrix& ref) : r(r), ref(ref) {}<br />

row cursor& operator++() { r++; return ∗this; }<br />

row cursor operator++(int) { row cursor tmp(∗this); r++; return tmp; }<br />

bool operator!=(const row cursor& other) { return r != other.r; }<br />

nz cursor begin() const { return nz cursor(ref.begin row(r)); }<br />

nz cursor end() const { return nz cursor(ref.begin row(r+1)); }<br />

protected:<br />

int r;<br />

const Matrix& ref;<br />

};


4.10. CURSORS AND PROPERTY MAPS 127<br />

Its implementation is almost the same as nz cursor and with some refactoring one could certainly<br />

combine it in one implementation that serves both cursors as base class. For the sake of<br />

simplicity we refrain from it here. The two main differences to nz cursor are<br />

• The lack of operator∗ because the cursor is not intended to be dereferred; and<br />

• The functions begin and end to provide the inter loop traversal.<br />

The according functions to provide a right-open interval of row cursors are straight <strong>for</strong>ward:<br />

template <br />

row cursor row begin(const Matrix& A)<br />

{<br />

return row cursor(0, A);<br />

}<br />

template <br />

row cursor row end(const Matrix& A)<br />

{<br />

return row cursor(A.num rows(), A);<br />

}<br />

We can now write begin and end functions that take a row cursor (instead of a matrix) as<br />

argument and give the right-open interval of the rows non-zeros:<br />

template <br />

nz cursor nz begin(const row cursor& c)<br />

{<br />

return c.begin();<br />

}<br />

template <br />

nz cursor nz end(const row cursor& c)<br />

{<br />

return c.end();<br />

}<br />

For the inner loop we can reuse nz cursor and only need to determine the right intervals within<br />

each row. This is per<strong>for</strong>med with the begin and end function from row cursor which in turn uses<br />

begin row from the matrix. That is why the row cursor needs a matrix reference.<br />

A two-dimensional traversal is realized as follows:<br />

<strong>for</strong> (row cursor< coo matrix > c= row begin(A), end= row end(A); c != end; ++c) {<br />

std::cout ≪ ”−−−−−\n”;<br />

<strong>for</strong> (nz cursor ic= nz begin(c), iend= nz end(c); ic != iend; ++ic)<br />

std::cout ≪ ”A[” ≪ row(∗ic) ≪ ”][” ≪ col(∗ic) ≪ ”] = ” ≪ value(∗ic) ≪ ”\n”;<br />

}<br />

std::cout ≪ ”−−−−−\n”;<br />

The outer loop iterates over all rows of the matrix and the inner loop over all non-zeros in this<br />

row.<br />

Résumé The technique is more complicated and less readable than accessing entries with<br />

operator[] and needs some familiarization. However, it allows <strong>for</strong>


128 CHAPTER 4. GENERIC PROGRAMMING<br />

• High Code Reuse with very Diverse Data Structures;<br />

• While still enabling High Per<strong>for</strong>mance.<br />

4.11 Exercises<br />

TODO: Move exercises to next chapter<br />

4.11.1 Unroll a loop<br />

Look at the loop from Subsection ??:<br />

int sum = 0;<br />

<strong>for</strong> (int i = 1 ; i


4.11. EXERCISES 129<br />

1<br />

2<br />

3<br />

4<br />

function gcd(a, b):<br />

if b = 0 return a<br />

else return gcd(b, a mod b)<br />

Then write an integral metafunction that executes the same algorithm but at compile time.<br />

Your metafunction should be of the following <strong>for</strong>m:<br />

template <br />

struct gcd meta {<br />

static int const value = ... ;<br />

} ;<br />

i.e. gcd meta::value is the GCD of a and b. Verify whether the results correspond with your<br />

C ++ function gcd().<br />

4.11.6 Overloading of functions<br />

Overloading of functions is possible <strong>for</strong> different types, e.g.<br />

void foo( int i ) { ... }<br />

void foo( double d ) { ... }<br />

This is an exercise on another <strong>for</strong>m of overloading: based on a boolean meta expression. We<br />

will use the Boost functions enable if and disable if <strong>for</strong> this exercise.<br />

#include <br />

#include <br />

template <br />

typename boost::enable if< boost::is integral, T >::type foo( T const& v ) {<br />

return v ;<br />

}<br />

template <br />

typename boost::disable if< boost::is integral, T >::type foo( T const& v ) {<br />

return std::floor( v ) ;<br />

}<br />

If we call e.g. foo(5);, the compiler uses the special version <strong>for</strong> integers:<br />

template <br />

T foo( T const& v ) {<br />

return v ;<br />

}<br />

If we call e.g. foo(5.0);, the compiler uses the special version <strong>for</strong> types that are not integral:<br />

template <br />

T foo( T const& v ) {<br />

return std::floor( v ) ;<br />

}


130 CHAPTER 4. GENERIC PROGRAMMING<br />

Create a meta function to check whether a type is a pointer. Write a function evaluate that<br />

returns the same value as its argument, except when the argument is a pointer, in which case<br />

you return the value pointed to by the pointer. Hint: look at http://www.boost.org/libs/<br />

utility/enable_if.html <strong>for</strong> enable if c.<br />

4.11.7 Meta-list<br />

Revisit exercise ??.<br />

Make a list of types. Make meta functions insert, append, delete and size.<br />

4.11.8 Iterator of a vector<br />

Revisit exercise ??. Add methods begin() and end() <strong>for</strong> returning a begin and end iterator. Add<br />

the types iterator and const iterator to the class. Note that pointers are iterators.<br />

Use the STL functions sort and lower bound.<br />

4.11.9 Iterator of a list<br />

Revisit exercise ??.<br />

Make a generic list type.<br />

Add methods begin() and end() <strong>for</strong> returning a begin and end const iterator. Add the type<br />

const iterator to the class. Note that pointers cannot be used as iterators.<br />

4.11.10 Trapezoid rule<br />

A simple method <strong>for</strong> computing the integral of a function is the trapezoid rule. Suppose we<br />

want to integrate the function f over the interval [a, b]. We split the interval in n small intervals<br />

[xi, xi+1] of the same length h = (b − a)/n and approximate f by a piecewise linear function.<br />

The integral is then approximated by the sum of the integrals of the piecewise linear function.<br />

This gives us the <strong>for</strong>mula :<br />

I = h<br />

2<br />

f(a) + h<br />

2<br />

n−1 �<br />

f(b) + h<br />

j=1<br />

f(a + jh) (4.1)<br />

In this exercise, we develop a function <strong>for</strong> the trapezoid rule, with a functor argument. We<br />

develop software using inheritance and using generic programming. Then we use the function<br />

<strong>for</strong> integrating the following functions:<br />

• f = exp(−3x) <strong>for</strong> x ∈ [0, 4]. Try the following arguments of trapezoid:<br />

double exp3( double x ) {<br />

return std::exp( 3.0 ∗ x ) ;<br />

}<br />

struct exp3 {


4.11. EXERCISES 131<br />

double operator() ( double x ) const {<br />

return std::exp( 3.0 ∗ x ) ;<br />

}<br />

} ;<br />

• f = sin(x) if x < 1 and f = cos(x) if x ≥ 1 <strong>for</strong> x ∈ [0, 4].<br />

• Can we use trapezoid( std::sin, 0.0, 2.0 ); ?<br />

As a second exercise, develop a functor <strong>for</strong> computing the finite difference. Then integrate the<br />

finite difference to verify that you get the function value back.<br />

4.11.11 STL and functor<br />

Write a generic function that copies the values of a container to another container after trans<strong>for</strong>mation<br />

using a functor:<br />

struct double functor {<br />

int operator() ( int v ) const {<br />

if (v my input vec ; ...<br />

std::vector< int > my output vec ;<br />

trans<strong>for</strong>m( my input vec.begin(), my input vec.end(), my output vec.begin(), double functor() ) ;<br />

Write code <strong>for</strong> the function trans<strong>for</strong>m and test it.


132 CHAPTER 4. GENERIC PROGRAMMING


Meta-programming<br />

Chapter 5<br />

‘Meta-programming’ is actually discovered by accident. Erwin Unruh wrote in the early 90’s<br />

a program that printed prime number as error messages. This showed that C ++ compilers can<br />

compute. Because the language has changed since Unruh wrote the example, here is a version<br />

adapted to today’s standard C ++:<br />

// Prime number computation by Erwin Unruh<br />

template struct D { D(void∗); operator int(); };<br />

template struct is prime {<br />

enum { prim = (p==2) || (p%i) && is prime2?p:0), i−1> :: prim };<br />

};<br />

template struct Prime print {<br />

Prime print a;<br />

enum { prim = is prime::prim };<br />

void f() { D d = prim ? 1 : 0; a.f();}<br />

};<br />

template struct is prime { enum {prim=1}; };<br />

template struct is prime { enum {prim=1}; };<br />

template struct Prime print {<br />

enum {prim=0};<br />

void f() { D d = prim ? 1 : 0; };<br />

};<br />

main() {<br />

Prime print a;<br />

a.f();<br />

}<br />

When tried to compile with g++ 4.1.2, one will observe the following error message: TODO:<br />

Need English error message.<br />

TODO: Ask Erwin Unruh if we can use his example.<br />

After people realized the computational power of the C ++ compiler, it was used to realize very<br />

powerful per<strong>for</strong>mance optimization techniques. In fact, one can per<strong>for</strong>m entire applications<br />

during compile time. Jeremiah Wilcock once wrote a Lisp interpreter that evaluated Lisp<br />

133


134 CHAPTER 5. META-PROGRAMMING<br />

expression during a C ++ compilation [?]. Todd Veldhuizen showed that the template type<br />

system of C ++ is Turing complete [?].<br />

On the other hand, excessive usage of meta-programming techniques can end in quite long<br />

compile times. Entire research projects were cancelled after many millions dollars of funding<br />

because even short applications of less than 20 lines took weeks of compile time on parallel<br />

computers. We know people who managed to produce a 18 MB error message (it came mainly<br />

from one single error). Nevertheless, the authors used a fair amount of meta-programming in<br />

their scientific projects and could still avoid exhaustive compile times. 1 Also compilers improved<br />

significantly in the last decade. Meanwhile the compile time grew quadratically in the template<br />

instantiation depth in old compilers, today compile grows only linearly [?].<br />

5.1 Let the Compiler Compute<br />

Typical introduction examples <strong>for</strong> meta-programming are factorial and Fibonacci numbers. It<br />

is computed recursively:<br />

template <br />

struct fibonacci<br />

{<br />

static const long value= fibonacci::value + fibonacci::value;<br />

};<br />

template <br />

struct fibonacci<br />

{<br />

static const long value= 1;<br />

};<br />

template <br />

struct fibonacci<br />

{<br />

static const long value= 1;<br />

};<br />

Note that we need the specialization <strong>for</strong> 1 and 2 to terminate the recursion. The following<br />

definition:<br />

template <br />

struct fibonacci<br />

{<br />

static const long value= N < 3 ? 1 : fibonacci::value + fibonacci::value; // error<br />

};<br />

ends in an infinite compile loop. For N = 2, the compiler would evaluate the expression:<br />

template <br />

struct fibonacci<br />

{<br />

static const long value= 2 < 3 ? 1 : fibonacci::value + fibonacci::value; // error<br />

};<br />

1 TODO: Oder René?


5.2. PROVIDING TYPE INFORMATION 135<br />

This requires the evaluation of fibonacci::value as<br />

template <br />

struct fibonacci<br />

{<br />

static const long value= 0 < 3 ? 1 : fibonacci< −1>::value + fibonacci< −2>::value; // error<br />

};<br />

which needs fibonacci< −1>::value . . . . Although the values <strong>for</strong> N < 3 are not used in the end,<br />

the compiler will nevertheless generate these terms infinitely and die at some point.<br />

We said be<strong>for</strong>e that we implement the computation recursively. In fact, all repetive calculations<br />

must be realized recursively as there is no iteration <strong>for</strong> 2 meta-functions.<br />

If we write <strong>for</strong> instance<br />

std::cout ≪ fibonacci::value ≪ ”\n”;<br />

the value would be already calculated during the compilation and the program just prints<br />

it. If you do not believe us, you can read the assembler code (e.g. compile with ‘g++ -S<br />

fibonacci.cpp -o fibonacci.asm’).<br />

We mentioned long compilations with meta-programming at the beginning of the chapter. The<br />

compilation <strong>for</strong> Fibonacci number 45 took less than a second. Compared to it, a naïve run-time<br />

implemtation:<br />

long fibonacci2(long x)<br />

{<br />

return x < 3 ? 1 : fibonacci2(x−1) + fibonacci2(x−2);<br />

}<br />

took 14s on the same computer. The reason is that the compiler remember intermediate results<br />

while the run-time version recomputes everything. We are, however, convinced that every reader<br />

of this book can rewrite fibonacci2 without the exponential overhead of recomputations.<br />

5.2 Providing Type In<strong>for</strong>mation<br />

5.2.1 Type Traits<br />

When we write template functions, we can easily define temporary values because they have<br />

usually the same type as one of the template arguments. But not always. Imagine you have a<br />

function that returns from two value that with the minimal magnitude:<br />

template <br />

T inline min magnitude(const T& x, const T& y)<br />

{<br />

using std::abs;<br />

T ax= abs(x), ay= abs(y);<br />

return ax < ay ? x : y;<br />

}<br />

We can call this <strong>for</strong> int, unsigned, double values:<br />

2 The Meta Programming Library provides compile-time iterators but even those are recursive internally.


136 CHAPTER 5. META-PROGRAMMING<br />

double d1= 3., d2= 4.;<br />

std::cout ≪ ”min magnitude(d1, d2) = ” ≪ min magnitude(d1, d2) ≪ ’\n’;<br />

If we call this function with two complex values:<br />

std::complex c1(3.), c2(4.);<br />

std::cout ≪ ”min magnitude(c1, c2) = ” ≪ min magnitude(c1, c2) ≪ ’\n’;<br />

we will see the error message<br />

no match <strong>for</strong> ≫operator< ≪in ≫ax < a≪<br />

The problem is that abs returns in this case double values which provides the comparison operator<br />

but we store them as complex values in the temporaries.<br />

The careful reader might think we do we store them at all, if we compared the magnitudes<br />

directly we might safe memory and we could compare them as they are. This absolutely true<br />

and this is how we would implement the function normally. However, there are situations where<br />

one need a temporary, e.g., when computing the value with the minimal magnitude in a vector.<br />

For the sake of simplicity we just look at two values. With the new standard we can also handle<br />

the issue easily with auto types like:<br />

template <br />

T inline min magnitude(const T& x, const T& y)<br />

{<br />

using std::abs;<br />

auto ax= abs(x), ay= abs(y);<br />

return ax < ay ? x : y;<br />

}<br />

To make a long story short, sometimes we need to know explicitly the result type of an expression<br />

or a type in<strong>for</strong>mation in general. Just think of a member variable of a template class: we must<br />

know the type of the member in the definition of the class.<br />

This leads us to ‘type traits’. Type traits meta-functions that provide an in<strong>for</strong>mation about a<br />

type.<br />

In the example here we search <strong>for</strong> a given type an appropriate type <strong>for</strong> its magnitude. We can<br />

provide such type in<strong>for</strong>mation by template specialization:<br />

template <br />

struct Magnitude {};<br />

template <br />

struct Magnitude<br />

{<br />

typedef int type;<br />

};<br />

template <br />

struct Magnitude<br />

{<br />

typedef float type;<br />

};<br />

template


5.2. PROVIDING TYPE INFORMATION 137<br />

struct Magnitude<br />

{<br />

typedef double type;<br />

};<br />

template <br />

struct Magnitude<br />

{<br />

typedef float type;<br />

};<br />

template <br />

struct Magnitude<br />

{<br />

typedef double type;<br />

};<br />

Admittedly, this is rather cumbersome.<br />

We can abbreviate the first definitions by postulating “if we do not know better, we assume<br />

that T’s Magnitude type is T itself.”<br />

template <br />

struct Magnitude<br />

{<br />

typedef T type;<br />

};<br />

This is true <strong>for</strong> all intrinsic types and we handle them all correctly with one definition. A slight<br />

disadvantage of this definition is that it incorrectly applies to all types whose type trait is not<br />

specialized. A set of classes where we know that the above definition is not correct, are all<br />

instantiations of the template class complex. So we define specializations like:<br />

template <br />

struct Magnitude<br />

{<br />

typedef double type;<br />

};<br />

Instead of defining them individually <strong>for</strong> complex, complex, . . . we use a templated<br />

<strong>for</strong>m to treat them all<br />

template <br />

struct Magnitude<br />

{<br />

typedef T type;<br />

};<br />

Now that the type traits are defined we can refactor our function to use it:<br />

template <br />

T inline min magnitude(const T& x, const T& y)<br />

{<br />

using std::abs;<br />

typename Magnitude::type ax= abs(x), ay= abs(y);<br />

return ax < ay ? x : y;<br />

}


138 CHAPTER 5. META-PROGRAMMING<br />

We can now consider extending this definition to vectors and matrices, e.g., to determine the<br />

return type of a norm. The specialization reads<br />

template <br />

struct Magnitude<br />

{<br />

typedef T type; // not really perfect<br />

};<br />

However, if the value type of the vector is complex, its norm will not. Instead, we need the<br />

magitude type from the values:<br />

template <br />

struct Magnitude<br />

{<br />

typedef typename Magnitude::type type;<br />

};<br />

5.2.2 A const-clean View Example<br />

In this section, we look at an efficient and expressive implementation of a transposed matrix. If<br />

you compute the transposed of a matrix, many software packages return a new matrix object<br />

with the interchanged values. This is a quite expensive operation: it requires memory allocation<br />

and deallocation and often copying a lot of data.<br />

Writing a Simple View Class<br />

A much more efficient approach is implementing a ‘View’ of the existing object. We refer<br />

internally to the viewed object and just adapt its interface. This can be done very nicely <strong>for</strong><br />

the transposed of a matrix:<br />

1 template <br />

2 class transposed view<br />

3 {<br />

4 public:<br />

5 typedef typename mtl::Collection::value type value type;<br />

6 typedef typename mtl::Collection::size type size type;<br />

7<br />

8 transposed view(Matrix& A) : ref(A) {}<br />

9<br />

10 value type& operator()(size type r, size type c) { return ref(c, r); }<br />

11 const value type& operator()(size type r, size type c) const { return ref(c, r); }<br />

12<br />

13 private:<br />

14 Matrix& ref;<br />

15 };<br />

Listing 5.1: Simple view implementation<br />

We assume that the matrix class has an operator() taking two arguments <strong>for</strong> the row and column<br />

index respectively. We further suppose that type traits are defined <strong>for</strong> value type and size type.<br />

This is all we need to know about the referred matrix, 3 at least in this mini example.<br />

3 TODO: We should define a concept <strong>for</strong> it.


5.2. PROVIDING TYPE INFORMATION 139<br />

The reader will imagine that implementations in libraries like MTL or GLAS will provide<br />

a larger interface in such classes. this short example is expressive enough to demonstrate the<br />

approach. However, the example is large enough to demonstrate the need of meta-programming<br />

in certain views.<br />

An object of this class can be handled like a matrix so that a template function use it as<br />

argument whereever a matrix is expected. The transposition is achieved by calling operator() in<br />

the referred object with switched indices. For every matrix object we can define a transposed<br />

view that behaves like a matrix<br />

mtl::dense2D A(3, 3);<br />

A= 2, 3, 4,<br />

5, 6, 7,<br />

8, 9, 10;<br />

tst::transposed view At(A);<br />

When we access At(i, j) we will get A(j, i). We even define a non-const access so that we can<br />

even change entries:<br />

At(2, 0)= 4.5;<br />

This operation sets A(0, 2) to 4.5.<br />

The definition of a transposed view object does not leed to particularly concise programs. For<br />

convience we define a function that returns the transposed view.<br />

template <br />

transposed view inline trans(Matrix& A)<br />

{<br />

return transposed view(A);<br />

}<br />

Now we can use the transposed elegantly in our scientific software, <strong>for</strong> instance in a matrix<br />

vector product:<br />

v= trans(A) ∗ q;<br />

In this case, a temporary view is created and used in the product. Since operator() from the<br />

view is inlined the transposed product will be as fast as with A itself.<br />

Dealing with Const-ness<br />

So far, so good. Problems arise if we build the transposed view of a constant matrix:<br />

const mtl::dense2D B(A);<br />

We still can create the transposed view of B but we cannot access its elements:<br />

std::cout ≪ ”tst::trans(B)(2, 0) = ” ≪ tst::trans(B)(2, 0) ≪ ’\n’; // error<br />

The compiler will tell us that it cannot initialize a ‘float&’ from a ‘const float’. If we look at the<br />

location of the error we will realize that this is line 9 in Listing 5.1. But we did the compiler<br />

used the non-constant version of the operator? In line 10 we defined an operator <strong>for</strong> constant<br />

objects which returns a constant reference and fits perfectly <strong>for</strong> this situation.


140 CHAPTER 5. META-PROGRAMMING<br />

First of all, is the ref member really constant? We never used const in the class definition or<br />

the function trans. Help is provided from the ‘Run-Time Type Identification (RTTI)’. We add<br />

the header ‘typeinfo’ and print the type in<strong>for</strong>mation:<br />

#include <br />

...<br />

std::cout ≪ ”typeid of trans(A) = ” ≪ typeid(tst::trans(A)).name() ≪ ’\n’;<br />

std::cout ≪ ”typeid of trans(B) = ” ≪ typeid(tst::trans(B)).name() ≪ ’\n’;<br />

This will produce the following output: 4<br />

typeid of trans(A) = N3tst15transposed_viewIN3mtl6matrix7dense2DIfNS2_10<br />

parametersINS1_3tag9row_majorENS1_5index7c_indexENS1_9non_fixed10<br />

dimensionsELb0EEEEEEE<br />

typeid of trans(B) = N3tst15transposed_viewIKN3mtl6matrix7dense2DIfNS2_10<br />

parametersINS1_3tag9row_majorENS1_5index7c_indexENS1_9non_fixed10<br />

dimensionsELb0EEEEEEE<br />

The output is apparently not very clear. However, if we look very careful, we see the extra<br />

‘K’ in the second line that tells us that the view is instantiated with a constant matrix type.<br />

Another disadvantage of RTTI is that we only see the const attribute of template parameters.<br />

That is printing the type in<strong>for</strong>mantion of trans(B).ref would not tell wether or not this type is<br />

constant.<br />

An alternative that solves both problems is inspecting the type by provocing an error message.<br />

We can <strong>for</strong> instance write:<br />

int ta= trans(A);<br />

int tb= trans(B);<br />

Then the compiler gives us a message like:<br />

trans_const.cpp:120: Error: ≫mtl::matrix::transposed_view >≪ cannot be converted to ≫int≪<br />

in initialization<br />

trans_const.cpp:121: Error: ≫const mtl::matrix::transposed_view≪ cannot be<br />

converted to ≫int≪ in initialization<br />

Here the types are much more readable. 5 We can see clearly that trans(B) returns a view with<br />

a constant template parameter. The same trick could be done <strong>for</strong> the reference in the view:<br />

int tar= trans(A).ref;<br />

int tbr= trans(B).ref;<br />

The error message would be accordingly:<br />

4<br />

With g++, on other compilers it might be different but the essential in<strong>for</strong>mation will be the same. The lines<br />

are broken manually.<br />

5<br />

TODO: Why the hell is this const outside in line 121???


5.2. PROVIDING TYPE INFORMATION 141<br />

trans_const.cpp:121: Error: ≫const mtl::matrix::dense2D≪ cannot be converted to ≫int≪<br />

in initialization<br />

Obviously, with this trick we will not get an executable binary. But we know more about the<br />

types in our program and can now better solve our problems. In the rare case that the type you<br />

examine is convertible to int, you can take any other type like std::set to which the examined<br />

class is not convertible. To exclude convertibility entirely you can introduce a new type.<br />

After this short excursion into type introspection we know <strong>for</strong> certain that the member ref is a<br />

constant reference. The following happens:<br />

• When we call trans(B) the function’s template argument is instantiated with const dense2D.<br />

• Thus, the return type is transposed view.<br />

• The constructor argument has type const dense2D&.<br />

• Likewise the member has type const dense2D&.<br />

It remains the question why the non-const version of the operator (line 9) is called despite we<br />

refer a constant matrix. The answer is that the constancy of ref does not matter <strong>for</strong> the choice<br />

but whether or not the view object is constant. Thus, we can write:<br />

const tst::transposed view Bt(B);<br />

std::cout ≪ ”Bt(2, 0) = ” ≪ Bt(2, 0) ≪ ’\n’;<br />

This works but it is not very elegant.<br />

A brutal possibility to get the view compiled <strong>for</strong> constant matrices is to cast away the constancy.<br />

The undesired result would be that mutable views on constant matrices enable the modification<br />

of the allegedly constant matrix. This violates so heavily our principles that we do not even<br />

show how the code would read.<br />

Rule<br />

Never cast away const.<br />

In the following we will empower you with very strong methodologies <strong>for</strong> handling constancy<br />

correctly. Every const cast is an indicator <strong>for</strong> a severe design error. As Sutter and Alexandrescu<br />

phrased it “If you go const you never go back.” The only situation where a const cast<br />

is needed by using const-incorrect third-party software, i.e. read-only arguments are passed as<br />

mutable pointers or references. That is not our fault and we have no choice. Un<strong>for</strong>tunately,<br />

there is still a lot of const-incorrect packages around and some of them would take too much<br />

resources to reimplement that we have to live with them. The best we can do is to add an<br />

appropriate API on top of it and avoid working with the original API. This saves ourselves<br />

from spoiling our applications with const casts and restricts the unspeakable const cast to the<br />

interface. A good example of such a layer is ‘Boost::Bindings’ [?] that provides const-correct


142 CHAPTER 5. META-PROGRAMMING<br />

high-quality interface to BLAS, LAPACK and other libraries with similarly old-fashioned 6 interfaces.<br />

Conversely, as long as we only use our own functions and classes we can avoid every<br />

const cast. 7<br />

We could implement a second view class <strong>for</strong> constant matrices and overload the trans function<br />

to return this view:<br />

template <br />

class const transposed view<br />

{<br />

public:<br />

typedef typename mtl::Collection::value type value type;<br />

typedef typename mtl::Collection::size type size type;<br />

};<br />

const transposed view(const Matrix& A) : ref(A) {}<br />

const value type& operator()(size type r, size type c) const { return ref(c, r); }<br />

//private:<br />

const Matrix& ref;<br />

template <br />

const transposed view inline trans(const Matrix& A)<br />

{<br />

return const transposed view(A);<br />

}<br />

This works fine and the user could use the trans function <strong>for</strong> both constant and mutable matrices.<br />

However, a complete new class definition is a fair amount of work where just one piece of the<br />

class definition needs to be altered. For this purpose we introduce two meta-functions.<br />

Check <strong>for</strong> Constancy<br />

Our problem with the view in Listing 5.1 is that it cannot handle constant types as template<br />

argument. To modify the behavior <strong>for</strong> constant arguments we first need to find out whether<br />

an argument is constant. The meta-function that provides this in<strong>for</strong>mantion is very simple to<br />

implement by partial template specialization:<br />

template <br />

struct is const<br />

{<br />

static const bool value= false;<br />

};<br />

template <br />

struct is const<br />

{<br />

static const bool value= true;<br />

};<br />

6 To phrase it diplomatically.<br />

7 We disagree with Sutter and Alexandrescu on the other exception <strong>for</strong> using const cast [SA05, page 179],<br />

this can be handled easily with an extra function. 8


5.2. PROVIDING TYPE INFORMATION 143<br />

Constant types match both definitions but the second one is more specific and there<strong>for</strong>e picked<br />

by the compiler. Non-constant types match only the first one. Note that the constancy of<br />

template parameters is not considered, e.g., view is not regarded constant.<br />

Compile-time Branching<br />

The other tool we need <strong>for</strong> our view is a type selection depending on a logical condition. Introduced<br />

was this technology by Krzysztof Czarnecki 9 and Ulrich W. Eisenecker [CE00]<br />

This can be achieved by a rather simple implementation<br />

1 template <br />

2 struct if c<br />

3 {<br />

4 typedef ThenType type;<br />

5 };<br />

6<br />

7 template <br />

8 struct if c<br />

9 {<br />

10 typedef ElseType type;<br />

11 };<br />

Listing 5.2: Compile-time if<br />

When this template is instantiated with a logical expressions and two type, only the general<br />

definition in line 1 matches when the first argument evaluates to true and the ‘ThenType’ is used<br />

in the type definition. If the first argument evaluates to false then the specialization in line 7<br />

is more specific so that the ‘ElseType’ is used. Like many ingenious inventions it is very simple<br />

once it is found.<br />

This allows us to define funny things like using double <strong>for</strong> temporaries when our maximal<br />

iteration number is larger than 100 other float:<br />

typedef tst::if c 100, double, float>::type tmp type;<br />

std::cout ≪ ”typeid = ” ≪ typeid(tmp type).name() ≪ ’\n’;<br />

Needless to say that ‘max iter’ must be known at compile time. Admittedly, the example does<br />

not look extremely useful and the meta-if is not so important in small isolated code snippets.<br />

On the other hand, <strong>for</strong> the development of large generic software packages, it becomes extremely<br />

important.<br />

A convenience function as defined in the Meta-Programming Library [GA04] is ‘if ’<br />

template <br />

struct if<br />

: if c<br />

{};<br />

It expects as first argument a type with a static const member named value and convertible to<br />

bool. In other words, it selects the type based on the value of condition (and saves typing 8<br />

characters).<br />

9 Zu dem Zeitpunkt war er Doktorand an der TU Ilmenau.


144 CHAPTER 5. META-PROGRAMMING<br />

The Solution<br />

Now we have all we need to revise the view from Listing 5.1. The problem was that we returned<br />

an entry of a constant matrix as mutable reference. To avoid this we can try to make the<br />

mutable access operator disappear in the view when the referred matrix is constant. This is<br />

possible but too complicated <strong>for</strong> the momemt. We will come back to this in Section 5.2.4.<br />

An easier solution is to keep both the mutable and the constant access operator but choose the<br />

return type of the <strong>for</strong>mer depending on the type of the template argument:<br />

1 template <br />

2 class transposed view<br />

3 {<br />

4 public:<br />

5 typedef typename mtl::Collection::value type value type;<br />

6 typedef typename mtl::Collection::size type size type;<br />

7 private:<br />

8 typedef typename if ::type vref type;<br />

12 public:<br />

13 transposed view(Matrix& A) : ref(A) {}<br />

14<br />

15 vref type operator()(size type r, size type c) { return ref(c, r); }<br />

16 const value type& operator()(size type r, size type c) const { return ref(c, r); }<br />

17<br />

18 private:<br />

19 Matrix& ref;<br />

20 };<br />

Listing 5.3: Const-safe view implementation<br />

This implementation returns a constant reference in line 15 when the referred matrix is constant<br />

and a mutable referrence <strong>for</strong> mutable referred matrix. Let us see if this is what we need. For<br />

mutable matrix references, the return type of operator() depends on the constancy of the view<br />

object:<br />

• If the view object is mutable (line 15) then operator() returns a mutable reference (line 10);<br />

and<br />

• If the view object is constant (line 16) then operator() returns a constant reference.<br />

This is the same behavior as in Listing 5.1.<br />

If the matrix reference is constant, then a constant reference is always returned:<br />

• If the view object is mutable (line 15) then operator() returns a mutable reference (line 9);<br />

and<br />

• If the view object is constant (line 16) then operator() returns a constant reference.<br />

Altogether, we implemented a view object that provides read and write access whereever appropriate<br />

and disables it where inappropriate.


5.2. PROVIDING TYPE INFORMATION 145<br />

5.2.3 More Useful Meta-functions<br />

The Boost Type Traits library [?] provides a large spectrum of meta-functions to test or manipulate<br />

attributes of types. Some of them are rather easy to implement — like the previously<br />

introduced is const — and others — like has trivial constructor or is base — require deep insight<br />

into C ++ subtleties and often into compiler internals as well. Unless one only uses very simple<br />

type traits and wants absolutely avoid the dependency of an external library, it is advisable to<br />

favor the extensively tested implementations from the type traits library over rewriting it.<br />

With the boost::is xyz we can implement special behavior <strong>for</strong> certain sets of types. One can easily<br />

add tests <strong>for</strong> domain specific type sets:<br />

template <br />

struct is matrix<br />

: boost::mpl::false<br />

{};<br />

template <br />

struct is matrix<br />

: boost::mpl::true<br />

{};<br />

// more matrix classes ...<br />

template <br />

struct is matrix<br />

: is matrix<br />

{};<br />

// more views ...<br />

Our program snippet is in line with the implementations in Boost. Instead of defining a static<br />

constant as in Section 5.2.2 we derive the meta function from boost::mpl::false and boost::mpl::true<br />

where static constants are defined with some additional typedefs. This not only shorter but<br />

requires also a bit less compile time, see [?]. 10<br />

The code is quite self-explanatory. Type we do not know are considered not being a matrix.<br />

Then we specialize <strong>for</strong> known matrix classes. For views we can further refer to the matrix-ness<br />

of the template argument.<br />

Alternatively, we can say in the type trait that every transposed view is a matrix and instead<br />

require <strong>for</strong> template arguments of transposed view that they are matrices.<br />

#include <br />

template <br />

class transposed view<br />

{<br />

BOOST STATIC ASSERT((is matrix::value)); // Make sure that the argument is a matrix type<br />

// ...<br />

};<br />

This additional assertion guarantees that the view class can only be instantiated with known<br />

matrix types. For other argument types the compilation, will terminate in this line. Un<strong>for</strong>tunately,<br />

the error message is not very in<strong>for</strong>mative <strong>for</strong> not saying confusing:<br />

10 TODO: page


146 CHAPTER 5. META-PROGRAMMING<br />

trans_const.cpp:96: Error: Invalid application of ≫sizeof≪ on incomplete type<br />

≫boost::STATIC_ASSERTION_FAILURE≪<br />

If you see an error message with “STATIC ASSERTION” in it, do not think about the message<br />

itself (it is meaningless) but look at the source code line that caused this error and hope that<br />

the author of the assertion will provide more in<strong>for</strong>mation in a comment.<br />

When we try to compile our test with the assertion we will see that trans(A) compiles but<br />

not trans(B). The reason is that ‘const dense2D’ is considered different from ‘dense2D’ in<br />

template specialization so that it is still considered non-matrix. The good new is that we do not<br />

need to double our specializations <strong>for</strong> mutable and constant types but we can write a partial<br />

specialization <strong>for</strong> all constant arguments:<br />

template <br />

struct is matrix<br />

: is matrix {};<br />

Note that BOOST STATIC ASSERT is a macro and does not understand C ++. This manifests in<br />

particular if the argument contains one or more commas. Than the preprocessor will interpret<br />

this as multiple arguments <strong>for</strong> the macro and get confused. This confusion can be avoided<br />

by enclosing the argument of BOOST STATIC ASSERT with two enclosing parentheses as we did<br />

in the example (although it was not necessary here). Despite the double parentheses and the<br />

rather arbitrary error message, static assertions are very useful to increase reliabily. The next<br />

C ++ standard will provide static assertions in the language like:<br />

template <br />

class transposed view<br />

{<br />

static assert(is matrix::value, ”transposed view requires a matrix as argument”);<br />

// ...<br />

};<br />

As the reader can see, the integration into the language overcomes the be<strong>for</strong>e-mentioned deficiencies<br />

of the macro implementation.<br />

Useful are meta-functions to remove something from a type if exists, e.g. remove const trans<strong>for</strong>ms<br />

const T into T and non-constant types remain unchanged. Note that this only removes the<br />

constancy of entire types not that of template arguments, e.g., in vector the constancy<br />

of the arguments is not removed.<br />

Dually, meta-functions can add something to a type:<br />

typedef typename boost::add reference::type ref type;<br />

It would be shorter to just add an & but this is easily overseen in longer type definitions. More<br />

importantly, if some trait returns already a reference then it is an error to add another one. The<br />

meta-function adds the reference only to types that are no references yet. To adding const to a<br />

type we find it more concise without the meta-function:<br />

typedef typename some trait::type const const type;<br />

If the type trait returns already a constant type, the second const will be ignored.<br />

The widest functionality in the area of meta-programming provides the Boost Meta-Programming<br />

Library (MPL) [GA04]. The library implements most of the STL algorithms (§ 4.9) and also


5.2. PROVIDING TYPE INFORMATION 147<br />

provides similar data types, e.g., vector or map. Another interesting library is Boost Fusion [?]<br />

that helps the mixing the execution at compile and run time. Both libraries are well documented<br />

and there<strong>for</strong>e not further discussed here.<br />

5.2.4 Enable-If<br />

A very powerful mechanism <strong>for</strong> meta-programming is “enable if” discovered by Jaakko Järvi<br />

and Jeremiah Wilcock. It bases on the paradigm SFINAE — Substitution Failure Is Not An<br />

Error. Imagine a function call with a given argument type — say dense vector. One of the<br />

overloads has a return type that is determined by a meta-function depending on the function<br />

argument. Then compiler will substitute the meta-function argument with dense vector<br />

to find out the return type. If this meta-function is defined dense vector then the template<br />

function (overload) has no return type. Instead of generating an error message, the C ++ compiler<br />

diligently ignores this overload. Of course, an error might occur later if all overloads are ignored<br />

<strong>for</strong> the given type or compiler cannot determine the most specific overload between those that<br />

are not ignored the.<br />

This compiler behavior can be explored to select an implementation based on meta-functions. As<br />

an example think of the L1 norm. It is defined <strong>for</strong> vector spaces and linear operators. Although<br />

these definitions are related, the practical real-world implementation <strong>for</strong> finite-dimensional vectors<br />

and matrices is different. Of course we could implement L1 norm <strong>for</strong> every matrix and<br />

vector type so that the call one norm(x) would select the appropriate implementation <strong>for</strong> this<br />

type.<br />

More productively, we like have one single implementation <strong>for</strong> all matrix types (including views)<br />

and one single implementation <strong>for</strong> all vector types. We use meta-function is matrix and implement<br />

accordingly is vector:<br />

template <br />

struct is vector<br />

: boost::mpl::false<br />

{};<br />

template <br />

struct is vector<br />

: boost::mpl::true<br />

{};<br />

// ... more vector types<br />

We also need the meta-function Magnitude to handle the magnitude of complex matrices and<br />

vectors.<br />

The implementation of enable if is very simple. It defines a type if the condition holds and none<br />

if the condition does not. The version in Boost adds a second level to access the static value<br />

member in types:<br />

template <br />

struct enable if c {<br />

typedef T type;<br />

};<br />

template <br />

struct enable if c {};


148 CHAPTER 5. META-PROGRAMMING<br />

template <br />

struct enable if<br />

: public enable if c<br />

{};<br />

The real enabling behavior is realized in enable if c whereas enable if is merely a convience function<br />

to avoid type ‘::value’.<br />

Now we have all we need to implement the L1 norm in the generic fashion we aimed <strong>for</strong>:<br />

1 template <br />

2 typename boost::enable if::type<br />

3 inline one norm(const T& A)<br />

4 {<br />

5 using std::abs;<br />

6 typedef typename Magnitude::type mag type;<br />

7 mag type max(0);<br />

8 <strong>for</strong> (unsigned c= 0; c < num cols(A); c++) {<br />

9 mag type sum(0);<br />

10 <strong>for</strong> (unsigned r= 0; r < num cols(A); r++)<br />

11 sum+= abs(A[r][c]);<br />

12 max= max < sum ? sum : max;<br />

13 }<br />

14 return max;<br />

15 }<br />

16<br />

17 template <br />

18 typename boost::enable if::type<br />

19 inline one norm(const T& v)<br />

20 {<br />

21 using std::abs;<br />

22 typedef typename Magnitude::type mag type;<br />

23 mag type sum(0);<br />

24 <strong>for</strong> (unsigned r= 0; r < size(v); r++)<br />

25 sum+= abs(v[r]);<br />

26 return sum;<br />

27 }<br />

The selection is now driven by enable if in line 2 and 18. Let us look at line 2 in detail <strong>for</strong> a<br />

matrix argument:<br />

1. is matrix is evaluated to (i.e. inherited from) true ;<br />

2. enable if passes true ::value i.e. true to enable if c;<br />

3. enable if c< >::type is set to typename Magnitude::type;<br />

4. This is the return type of the function overload.<br />

What happens in this line when the argument is not a matrix type:<br />

1. is matrix is evaluated to (i.e. inherited from) false ;<br />

2. enable if passes false ::value i.e. false to enable if c<br />

3. enable if c< >::type is not set in this case;


5.2. PROVIDING TYPE INFORMATION 149<br />

4. The function overload has no return type;<br />

5. Is there<strong>for</strong>e ignored.<br />

For short, the overload is only enabled if the argument is a matrix — as the names of the<br />

meta-functions say. Likewise the second overload is only available <strong>for</strong> vectors. A short test<br />

demonstrates this:<br />

mtl::dense2D A(3, 3);<br />

A= 2, 3, 4,<br />

5, 6, 7,<br />

8, 9, 10;<br />

mtl::dense vector v(3);<br />

v= 3, 4, 5;<br />

std::cout ≪ ”one norm(A) is ” ≪ tst::one norm(A) ≪ ”\n”;<br />

std::cout ≪ ”one norm(v) is ” ≪ tst::one norm(v) ≪ ”\n”;<br />

For types that are neither matrix or vector it will look as there is no function one norm at all.<br />

Types that are considered both matrix and vector would cause an ambiguity.<br />

Draw-backs: The mechanism of enable if is very powerful but not particularly pleasant to<br />

debug. Error messages caused by enable if are usually rather long but not very meaningful. If<br />

a function match is missing <strong>for</strong> a given argument type, it is hard to determine why because no<br />

helpful in<strong>for</strong>mation is provided to the programmer, he/she is only told that no match is found,<br />

period. The enabling mechanism can not select the most specific condition. For instance,<br />

we cannot specialize implementation <strong>for</strong> say is sparse matrix. This can be achieved by avoid<br />

ambiguities in the conditions:<br />

template <br />

typename boost::enable if c::type<br />

inline one norm(const T& A);<br />

template <br />

typename boost::enable if::type<br />

inline one norm(const T& A);<br />

Evidently, this will become quite confusing if too many hierarchical conditions are considered.<br />

The SFINAE paradigm only applies to template arguments of the function itself. There<strong>for</strong>e,<br />

member functions cannot be enabled depending on the class’ template argument. For instance,<br />

the mutable access operator in line 9 of Listing 5.1 cannot be hidden with enable if <strong>for</strong> views on<br />

constant matrices because the operator itself is not a template function. There are possibilities<br />

to introduce a template argument artificially <strong>for</strong> a member function to enable enable if but this<br />

really does not contribute to the clarity of the program.<br />

Concepts can handle hierarchies in conditions, non-template member functions and provide also<br />

more helpful error messages. Un<strong>for</strong>tunately, they will not be available in C ++0x and it is not<br />

clear yet when they will usable <strong>for</strong> mainstream programming.


150 CHAPTER 5. META-PROGRAMMING<br />

5.3 Expression Templates<br />

Scientific software has usually strong per<strong>for</strong>mance requirements — especially those problems<br />

we tackle with C ++. Many large-scale simulations of physical, chemical, or biological processes<br />

run <strong>for</strong> weeks or months and everybody is glad if at least a part of this very long execution<br />

times can be safed. Such safings are often at the price of readable and maintainable program<br />

sources. In Section 5.3.1 we will show a simple implementation of an operator and discuss why<br />

this is not efficient and in the remainder of Section 5.3 we will demonstrate how to improve to<br />

improve the per<strong>for</strong>mance without sacrificing the natural notation.<br />

5.3.1 Simple Operator Implementation<br />

Assume we have an application with vector addition. We want <strong>for</strong> instance write an expression<br />

of the following <strong>for</strong>m <strong>for</strong> vectors w, x, y and z:<br />

w = x + y + z;<br />

Say, we have a vector class as in Section 4.3:<br />

template <br />

class vector<br />

{<br />

public:<br />

explicit vector(int size) : my size(size), data(new T[my size]) {}<br />

vector() : my size(0), data(0) {}<br />

};<br />

friend int size(const vector& x) { return x.my size; }<br />

const T& operator[](int i) const { check index(i); return data[i]; }<br />

T& operator[](int i) { check index(i); return data[i]; }<br />

// ...<br />

We can of course provide an operator <strong>for</strong> adding such vectors:<br />

template <br />

vector inline operator+(const vector& x, const vector& y)<br />

{<br />

x.check size(size(y));<br />

vector sum(size(x));<br />

<strong>for</strong> (int i= 0; i < size(x); ++i)<br />

sum[i] = x[i] + y[i];<br />

return sum;<br />

}<br />

A short test program checks that everything works:<br />

int main()<br />

{<br />

vector x(4), y(4), z(4), w(4);<br />

x[0]= x[1]= 1.0; x[2]= 2.0; x[3] = −3.0;<br />

y[0]= y[1]= 1.7; y[2]= 4.0; y[3] = −6.0;<br />

z[0]= z[1]= 4.1; z[2]= 2.6; z[3] = 11.0;


5.3. EXPRESSION TEMPLATES 151<br />

}<br />

std::cout ≪ ”x = ” ≪ x ≪ std::endl;<br />

std::cout ≪ ”y = ” ≪ y ≪ std::endl;<br />

std::cout ≪ ”z = ” ≪ z ≪ std::endl;<br />

w= x + y + z;<br />

std::cout ≪ ”w= x + y + z = ” ≪ w ≪ std::endl;<br />

return 0;<br />

If this works properly, what is wrong with it? From the software engineering prospective:<br />

nothing. From the per<strong>for</strong>mance prospective: a lot.<br />

How is the statement executed:<br />

1. Create temporary variable sum <strong>for</strong> the addition of x and y;<br />

2. Per<strong>for</strong>m a loop reading x and y, adding it element-wise, and writing the result to sum;<br />

3. Copy sum to a temporary variable, say t xy, in the return statement;<br />

4. Delete sum;<br />

5. Create temporary variable sum <strong>for</strong> the addition of t xy and z;<br />

6. Per<strong>for</strong>m a loop reading t xy and z, adding it element-wise, and writing the result to sum;<br />

7. Copy sum to a temporary variable, say t xyz, in the return statement;<br />

8. Delete sum;<br />

9. Delete t xy;<br />

10. Per<strong>for</strong>m a loop reading t xyz and writing to w;<br />

11. Delete t xyz;<br />

This is admittedly the worst-case scenario. But it was the code that old compilers generated.<br />

Modern compilers per<strong>for</strong>m more optimizations by static code analysis and can avoid copying<br />

the return value into the temporaries t xy and t xyz. Instead of creating the temporaries t xy<br />

and t xyz, they become aliases <strong>for</strong> the respective sum temporaries.<br />

The optimized version per<strong>for</strong>ms:<br />

1. Create temporary variable sum (<strong>for</strong> distinction sum xy) <strong>for</strong> the addition of x and y;<br />

2. Per<strong>for</strong>m a loop reading x and y, adding it element-wise, and writing the result to sum;<br />

3. Create temporary variable sum (<strong>for</strong> distinction sum xyz) <strong>for</strong> the addition of sum xy and z;<br />

4. Per<strong>for</strong>m a loop reading sum xy and z, adding it, and writing the result to sum xyz;<br />

5. Delete sum xy;<br />

6. Per<strong>for</strong>m a loop reading sum xyz and writing to w;<br />

7. Delete sum xyz;<br />

How much operations did we per<strong>for</strong>m? Say our vectors have lenght n then we have in total:<br />

• 2n additions;


152 CHAPTER 5. META-PROGRAMMING<br />

• 3n assignments;<br />

• 5n reads;<br />

• 3n writes;<br />

• 2 memory allocations; and<br />

• 2 memory deallocations.<br />

As comparison, if we could write a single loop or an inline function:<br />

template <br />

void inline add3(const vector& x, const vector& y, const vector& z, vector& sum)<br />

{<br />

x.check size(size(y));<br />

x.check size(size(z));<br />

x.check size(size(sum));<br />

<strong>for</strong> (int i= 0; i < size(x); ++i)<br />

sum[i] = x[i] + y[i] + z[i];<br />

}<br />

This function per<strong>for</strong>ms:<br />

• 2n additions;<br />

• n assignments;<br />

• 3n reads;<br />

• n writes;<br />

The call of this function:<br />

add3(x, y, z, w);<br />

is of course less elegant than the operator notation. Often, one need another look at the<br />

documentation wether the first or the last argument contains the result. With operators this is<br />

evident.<br />

In high-per<strong>for</strong>mance software, programmers tend to implement a hard-coded version of every<br />

important operation instead of freely compose them from smaller expressions. The reason is<br />

obvious, our operator implementation per<strong>for</strong>med additionally:<br />

• 2n assignments;<br />

• 2n reads;<br />

• 2n writes;<br />

• 2 memory allocations; and<br />

• 2 memory deallocations.<br />

The good news is we have not per<strong>for</strong>med additional arithmetic. The bad news is that the<br />

operations above are more expensive. On modern computers, it takes much more time to<br />

read data from or write to the memory than executing fixed or floating point operations. 11<br />

Un<strong>for</strong>tunately, vectors in scientific applications tend to be rather long, often larger than the<br />

11 TODO: Maybe quantifying <strong>for</strong> some machine.


5.3. EXPRESSION TEMPLATES 153<br />

caches of the plat<strong>for</strong>m and the vectors must really be transfer to and from main memory. In<br />

case of shorter vectors, the data might reside in L1 or L2 cache and the data transfer is less<br />

critical. But in this case, the allocation and deallocation becomes a serious slow down factor.<br />

The purpose of expression templates is to keep the original operator notation without introducing<br />

the overhead induced by temporaries.<br />

5.3.2 An Expression Template Class<br />

The solution is to introduce a special class that keeps references to the vectors and allows us<br />

to per<strong>for</strong>m all computations later in one sweep. The addition does not return now a vector but<br />

an object with the references:<br />

template <br />

class vector sum<br />

{<br />

public:<br />

vector sum(const vector& v1, const vector& v2) : v1(v1), v2(v2) {}<br />

private:<br />

const vector& v1, v2;<br />

};<br />

template <br />

vector sum inline operator+(const vector& x, const vector& y)<br />

{<br />

return vector sum(x, y);<br />

}<br />

Now we can already write x + y but not w= x + y yet. It is not only that the assignment is not<br />

defined, we have not yet provided vector sum with enough functionality to per<strong>for</strong>m something<br />

useful in the assignment. Thus, we first extend vector sum so that it looks like a vector itself:<br />

template <br />

class vector sum<br />

{<br />

void check index(int i) const { assert(i >= 0 && i < size(v1)); }<br />

public:<br />

vector sum(const vector& v1, const vector& v2) : v1(v1), v2(v2)<br />

{<br />

assert(size(v1) == size(v2));<br />

}<br />

friend int size(const vector sum& x) { return size(x.v1); }<br />

T operator[](int i) const { check index(i); return v1[i] + v2[i]; }<br />

private:<br />

const vector& v1, v2;<br />

};<br />

For the sake of defensive programming, we added a test that the two vectors have the same<br />

size and can be consistently added. Then we consider the size of the first vector as the size of<br />

our vector sum. The most important function is the bracket operator: when the i th entry we<br />

compute the sum of the operands i th entries.


154 CHAPTER 5. META-PROGRAMMING<br />

Discussion 5.1 The drawback is that if the entries are accessed multiple times the sum is<br />

recomputed. On the other hand, most expressions are only used once and this is not a problem.<br />

An example where vector entries are accessed several times is A ∗ (x+y). Here, it is preferable<br />

to compute a true vector first instead of computing the matrix vector product on the expression<br />

template. 12<br />

To evaluate w= x + y we also need an assignment operator <strong>for</strong> vector sum:<br />

template class vector sum; // <strong>for</strong>ward declaration<br />

template <br />

class vector<br />

{ // ...<br />

vector& operator=(const vector sum& that)<br />

{<br />

check size(size(that));<br />

<strong>for</strong> (int i= 0; i < my size; ++i)<br />

data[i]= that[i];<br />

return ∗this;<br />

}<br />

};<br />

The assignment runs a loop over w and that. As that is an object of type vector sum the expression<br />

that[i] computes x[i] + y[i]. In contrast to the implementationn in Section 5.3.1 we have now<br />

• Only one loop;<br />

• No temporary vector;<br />

• No additional memory allocation and deallocation; and<br />

• No addional data reads and writes.<br />

In fact, the same operations are per<strong>for</strong>med as in the loop<br />

<strong>for</strong> (int i= 0; i < size(w); ++i)<br />

w[i] = x[i] + y[i];<br />

The cost to create a vector sum object is negligible. The object will be kept on the stack and does<br />

not require memory allocation. Even that little ef<strong>for</strong>t <strong>for</strong> creating the object will be optimized<br />

away by most compilers with static code analysis.<br />

What happens when we like to add three vectors? The naïve implementation from § 5.3.1<br />

returns a vector and this vector can be added to another vector. Our approach returns a<br />

vector sum and we have no addition <strong>for</strong> vector sum and vector. Thus we would need another ET<br />

class and an according operation:<br />

template <br />

class vector sum3<br />

{<br />

void check index(int i) const { assert(i >= 0 && i < size(v1)); }<br />

public:<br />

vector sum3(const vector& v1, const vector& v2, const vector& v3) : v1(v1), v2(v2), v3(v3)<br />

{<br />

assert(size(v1) == size(v2)); assert(size(v1) == size(v3));<br />

12 TODO: Shall we provide a solution <strong>for</strong> this as well? This something that is over-due in MTL4 anyway.


5.3. EXPRESSION TEMPLATES 155<br />

}<br />

friend int size(const vector sum3& x) { return size(x.v1); }<br />

T operator[](int i) const { check index(i); return v1[i] + v2[i] + v3[i]; }<br />

private:<br />

const vector& v1, v2, v3;<br />

};<br />

template <br />

vector sum3 inline operator+(const vector sum& x, const vector& y)<br />

{<br />

return vector sum3(x.v1, x.v2, y);<br />

}<br />

Furthermore, vector sum must declare our new plus operator as friend to access its private<br />

members and vector needs an assignment <strong>for</strong> vector sum3. This becomes increasingly annoying.<br />

Also, what happens if we per<strong>for</strong>m the second addition first w= x + (y + z)? Then we<br />

need another plus operator. What if some of the vectors are multiplied by a scalar, e.g.,<br />

w= x + dot(x, y) ∗ y + 4.3 ∗ z, and this scalar product is also implemented by an ET? Our implementation<br />

ef<strong>for</strong>t runs into combinatorial explosion and we need a more flexible solution that<br />

we introduce in the next section.<br />

5.3.3 Generic Expression Templates<br />

So far we started from a specific class (vector) and generalized the implementation gradually.<br />

Although this can help us to understand the mechanism, we like to go now to the general version<br />

that takes arbitrary vector types:<br />

template <br />

vector sum inline operator+(const V1& x, const V2& y)<br />

{<br />

return vector sum(x, y);<br />

}<br />

We now need an expression class with arbitrary arguments:<br />

template <br />

class vector sum<br />

{<br />

typedef vector sum self;<br />

void check index(int i) const { assert(i >= 0 && i < size(v1)); }<br />

public:<br />

vector sum(const V1& v1, const V2& v2) : v1(v1), v2(v2)<br />

{<br />

assert(size(v1) == size(v2));<br />

}<br />

???? operator[](int i) const { check index(i); return v1[i] + v2[i]; }<br />

friend int size(const self& x) { return size(x.v1); }<br />

private:<br />

const V1& v1;


156 CHAPTER 5. META-PROGRAMMING<br />

};<br />

const V2& v2;<br />

This is rather straight<strong>for</strong>ward. The only issue is what type to return in operator[]? For this<br />

we must define value type in each class — more flexible would be an external type trait. In<br />

vector sum we take the value type of the first argument which can itself be taken from another<br />

class.<br />

template <br />

class vector sum<br />

{<br />

// ...<br />

typedef typename V1::value type value type;<br />

};<br />

value type operator[](int i) const { check index(i); return v1[i] + v2[i]; }<br />

To assign such an expression to a vector we can also generalize the assign operator:<br />

template <br />

class vector<br />

{<br />

public:<br />

typedef T value type;<br />

};<br />

template <br />

vector& operator=(const Src& that)<br />

{<br />

check size(size(that));<br />

<strong>for</strong> (int i= 0; i < my size; ++i)<br />

data[i]= that[i];<br />

return ∗this;<br />

}<br />

This assigment can also handle vector as argument and we can omit the standard assignment<br />

operator.<br />

Advantages of expression templates: Although the availability of operator overloading<br />

in C ++ resulted in notationally nicer code, the scientific community refused to give up programming<br />

in Fortran or to implement the loops directly in C/C ++. The reason was that the<br />

traditional operator implementations were too expensive. Due to the overhead of the creation<br />

of temporary variables and the copying of vector and matrix objects, C ++ could not compete<br />

with the per<strong>for</strong>mance of programs written in Fortran. This problem has now been resolved<br />

by the introduction of generics and expression templates. Now it is possible to write efficient<br />

scientific programs in a notationally convenient manner.<br />

5.4 Meta-Tuning: Write Your Own Compiler Optimization<br />

Compiler technology is progressing and provides us an increasing number of optimization techniques.<br />

Ideally, everyone writes his software in the way it is the easiest <strong>for</strong> him and the compiler


5.4. META-TUNING: WRITE YOUR OWN COMPILER OPTIMIZATION 157<br />

trans<strong>for</strong>ms the operations to a <strong>for</strong>m that is best <strong>for</strong> execution time. We would only need a new<br />

compiler and our programs become faster. 13 But live — especially as advanced C ++ programmer<br />

— is no walk in the park. Of course, the compiler helps us a lot to speed up our programs.<br />

But there are limitations, many optimizations need knowledge of the semantic behavior and can<br />

there<strong>for</strong>e only be applied on types and operations where the semantic is known at the time the<br />

compiler is written, see also discussion in [?]. Research is going on, to overcome this limitations<br />

by providing concept-based optimization [?]. Un<strong>for</strong>tunately, this will take time until it becomes<br />

mainstream, especially now that concepts are taken out of the C ++0x standard. An alternative<br />

is source-to-source code trans<strong>for</strong>mation with external tools like ROSE [?].<br />

Even <strong>for</strong> types and operations that the compiler can handle, it has its limitations. Most compilers<br />

(gcc, . . . 14 ) only deal with the inner loop in nested ones (see solution in Section 5.4.2)<br />

and does not dare to introduce extra temporaries (see solution in Section ??). Some compilers<br />

are particularly tuned <strong>for</strong> benchmarks. 15 For instance, they have pattern matching to recognize<br />

a 3-nested loop that computes a dense matrix product and trans<strong>for</strong>m those in BLAS-like code<br />

with 7 or 9 plat<strong>for</strong>m-dependent loops. 16 All this said, writing high-per<strong>for</strong>mance software is no<br />

walk in the park. That does not mean that such software must be unreadable and unmaintainable<br />

hackery. The route of success is again to provide appropriate abstractions. Those can be<br />

empowered with compile-time optimizations so that the applications are still writen in natural<br />

mathematical notation whereas the generated binaries can still explore all known techniques<br />

<strong>for</strong> fast execution.<br />

5.4.1 Classical Fixed-Size Unrolling<br />

The easiest <strong>for</strong>m of compile-time optimization can be realized <strong>for</strong> fixed-size data types, in<br />

particular vectors as in Section 4.7. Simular to the default assignment, we can write a generic<br />

vector assignment:<br />

template <br />

class fsize vector<br />

{<br />

public:<br />

const static int my size= Size;<br />

};<br />

template <br />

self& operator=(const self& that)<br />

{<br />

<strong>for</strong> (int i= 0; i < my size; ++i)<br />

data[i]= that[i];<br />

}<br />

13<br />

In some sense, this is the programming equivalent of communism: everybody contributes as much as he<br />

pleases and like he pleases and in the end, the right thing will happen anyway thanks to a self-improving society.<br />

Likewise, some people write software in a very naïve fashion and blame the compiler not trans<strong>for</strong>ming their<br />

programs into high-per<strong>for</strong>mance code.<br />

14<br />

TODO: we should run some benchmarks on MSVC and icc.<br />

15<br />

TODO: search <strong>for</strong> paper on kcc.<br />

16<br />

One could sometimes get the impression that the HPC community beliefs that multiplying dense matrices<br />

at near-peak per<strong>for</strong>mance solves all per<strong>for</strong>mance issues of the world or at least demonstrates that everything can<br />

be computed at near-peak per<strong>for</strong>mance if only one tries hard enough. Fortunately, more and more people in the<br />

supercomputer centers realize that their machines are not only running BLAS3 and LAPACK operations and<br />

that real-world applications are more often than not limited by memory bandwidth and latency.


158 CHAPTER 5. META-PROGRAMMING<br />

A state-of-the-art compiler will recognize that all iterations are independent one from each<br />

other, e.g., data[2]= that[2]; is independent of data[1]= that[1];. The compiler will also determine<br />

the size of loop during compilation. As a consequence, the generated binary of a type with size<br />

3 will be equivalent to:<br />

template <br />

class fsize vector<br />

{<br />

template <br />

self& operator=(const self& that)<br />

{<br />

data[0]= that[0];<br />

data[1]= that[1];<br />

data[2]= that[2];<br />

}<br />

};<br />

The right-hand-side vector that might be an expression template § 5.3 <strong>for</strong> say alpha ∗ x + y and<br />

its evaluation will be also inlined:<br />

template <br />

class fsize vector<br />

{<br />

template <br />

self& operator=(const self& that)<br />

{<br />

data[0]= alpha ∗ x[0] + y[0];<br />

data[1]= alpha ∗ x[1] + y[1];<br />

data[2]= alpha ∗ x[2] + y[2];<br />

}<br />

};<br />

To make the unrolling more explicit and <strong>for</strong> the sake of step-wise introducing meta-tuning we<br />

develop a functor that computes the assignment:<br />

template <br />

struct fsize assign<br />

{<br />

void operator()(Target& tar, const Source& src)<br />

{<br />

fsize assign()(tar, src);<br />

std::cout ≪ ”assign entry ” ≪ N ≪ ’\n’;<br />

tar[N]= src[N];<br />

}<br />

};<br />

template <br />

struct fsize assign<br />

{<br />

void operator()(Target& tar, const Source& src)<br />

{<br />

std::cout ≪ ”assign entry ” ≪ 0 ≪ ’\n’;<br />

tar[0]= src[0];<br />

}<br />

};


5.4. META-TUNING: WRITE YOUR OWN COMPILER OPTIMIZATION 159<br />

The print-outs shall show us the execution. For convenience, one can templatize the operator<br />

on the argument types:<br />

template <br />

struct fsize assign<br />

{<br />

template <br />

void operator()(Target& tar, const Source& src)<br />

{<br />

fsize assign()(tar, src);<br />

std::cout ≪ ”assign entry ” ≪ N ≪ ’\n’;<br />

tar[N]= src[N];<br />

}<br />

};<br />

template <br />

struct fsize assign<br />

{<br />

template <br />

void operator()(Target& tar, const Source& src)<br />

{<br />

std::cout ≪ ”assign entry ” ≪ 0 ≪ ’\n’;<br />

tar[0]= src[0];<br />

}<br />

};<br />

Then the vector types can by deduced by the compiler when the operator is called. Instead of<br />

the previous loop, we call the assignment functor in the operator:<br />

template <br />

class fsize vector<br />

{<br />

BOOST STATIC ASSERT((my size > 0));<br />

};<br />

self& operator=( const self& that )<br />

{<br />

fsize assign()(∗this, that);<br />

return ∗this;<br />

}<br />

template <br />

self& operator=( const Vector& that )<br />

{<br />

fsize assign()(∗this, that);<br />

return ∗this;<br />

}<br />

The execution of the following code fragment<br />

yields<br />

fsize vector v, w;<br />

v[0]= v[1]= 1.0; v[2]= 2.0; v[3]= −3.0;<br />

w= v;


160 CHAPTER 5. META-PROGRAMMING<br />

assign entry 0<br />

assign entry 1<br />

assign entry 2<br />

assign entry 3<br />

In this implementation, we replaced the loop by a recursion — counting on the compiler to<br />

inline the operations (otherwise it would be even slower as the loop) — and made sure that no<br />

loop index is incremented and tested <strong>for</strong> termination. This is only beneficial <strong>for</strong> small loops that<br />

run in L1 cache. Larger loops are dominated by loading the data from memory and the loop<br />

overhead is irrelevant. To the contrary, unrolling operations on very large vectors entirely will<br />

probably decrease the per<strong>for</strong>mance because a lot of instructions need to be loaded and decrease<br />

there<strong>for</strong>e the available bandwidth <strong>for</strong> the data. As mentioned be<strong>for</strong>e, compilers can unroll such<br />

operations by themselves — and hopefully know when it is better not to — and sometimes this<br />

automatic unrolling is even slightly faster then the explicit implementation.<br />

5.4.2 Nested Unrolling<br />

From our experience, compilers usually unroll nested loops. Even a good compiler that can<br />

handle certain nested loops will not be able to optimize every program kernel, in particular those<br />

with heavily templatized programs instantiated with user-defined types. We will demonstrate<br />

here how to unroll nested loops at compile time at the example of matrix vector multiplication.<br />

For this purpose, we introduce a simplistic fixed-size matrix type:<br />

template <br />

class fsize matrix<br />

{<br />

typedef fsize matrix self;<br />

public:<br />

typedef T value type;<br />

BOOST STATIC ASSERT((Rows ∗ Cols > 0));<br />

const static int my rows= Rows, my cols= Cols;<br />

fsize matrix()<br />

{<br />

<strong>for</strong> (int i= 0; i < my rows; ++i)<br />

<strong>for</strong> (int j= 0; j < my cols; ++j)<br />

data[i][j]= T(0);<br />

}<br />

fsize matrix( const self& that ) { ... }<br />

// cannot check column index<br />

const T∗ operator[](int r) const { return data[r]; }<br />

T∗ operator[](int r) { return data[r]; }<br />

mat vec et operator∗(const fsize vector& v) const<br />

{<br />

return mat vec et (∗this, v);<br />

}<br />

private:


5.4. META-TUNING: WRITE YOUR OWN COMPILER OPTIMIZATION 161<br />

};<br />

T data[Rows][Cols];<br />

The bracket operator returns a pointer <strong>for</strong> the sake of simplicity but a good implementation<br />

should return a proxy that allows <strong>for</strong> checking the column index. The multiplication with a<br />

vector is realized by means of an expression template <strong>for</strong> not copying the result vector.<br />

Then the vector assigment needs a specialization <strong>for</strong> the expression template 17<br />

template <br />

class fsize vector<br />

{<br />

template <br />

self& operator=( const mat vec et& that )<br />

{<br />

typedef mat vec et et;<br />

fsize mat vec mult()(that.A, that.v, ∗this);<br />

return ∗this;<br />

}<br />

};<br />

The functor fsize mat vec mult must now compute the matrix vector product on the three arguments.<br />

The general implementation of the functor reads:<br />

template <br />

struct fsize mat vec mult<br />

{<br />

template <br />

void operator()(const Matrix& A, const VecIn& v in, VecOut& v out)<br />

{<br />

fsize mat vec mult()(A, v in, v out);<br />

v out[Rows]+= A[Rows][Cols] ∗ v in[Cols];<br />

}<br />

};<br />

Again, the functor is only templatized on the sizes and the container types are deduced. The<br />

operator assumes that all smaller column indices are already handled and we can increment<br />

v out[Rows] by A[Rows][Cols] ∗ v in[Cols]. In particular, we assume that the first operation on<br />

v out[Rows] initializes it. Thus we need a (partial) specialization <strong>for</strong> Cols = 0:<br />

template <br />

struct fsize mat vec mult<br />

{<br />

template <br />

void operator()(const Matrix& A, const VecIn& v in, VecOut& v out)<br />

{<br />

fsize mat vec mult()(A, v in, v out);<br />

v out[Rows]= A[Rows][0] ∗ v in[0];<br />

}<br />

};<br />

The careful reader noticed the substitution of += by =. We also notice that we have to call the<br />

computation <strong>for</strong> the preceeding row with all columns and inductively <strong>for</strong> all smaller rows. The<br />

17 A better solution would be implementing all assignments with a functor and specialize the functor because<br />

partial template specialization of functions does not always work as expected.


162 CHAPTER 5. META-PROGRAMMING<br />

number of columns in the matrix is taken from an internal definition in the matrix type <strong>for</strong> the<br />

sake of simplicity. Passing this as extra template argument or taking a type traits would have<br />

been more general because we are now limited to types where my cols is defined in the class.<br />

We still need a (full) specialization to terminate the recursion:<br />

template <br />

struct fsize mat vec mult<br />

{<br />

template <br />

void operator()(const Matrix& A, const VecIn& v in, VecOut& v out)<br />

{<br />

v out[0]= A[0][0] ∗ v in[0];<br />

}<br />

};<br />

With the inlining, our program will execute the operation w= A ∗ v <strong>for</strong> vectors of size 4 as:<br />

w[0]= A[0][0] ∗ v[0];<br />

w[0]+= A[0][1] ∗ v[1];<br />

w[0]+= A[0][2] ∗ v[2];<br />

w[0]+= A[0][3] ∗ v[3];<br />

w[1]= A[1][0] ∗ v[0];<br />

w[1]+= A[1][1] ∗ v[1];<br />

w[1]+= A[1][2] ∗ v[2];<br />

w[1]+= A[1][3] ∗ v[3];<br />

w[2]= A[2][0] ∗ v[0];<br />

w[2]+= A[2][1] ∗ v[1];<br />

w[2]+= A[2][2] ∗ v[2];<br />

w[2]+= A[2][3] ∗ v[3];<br />

w[3]= A[3][0] ∗ v[0];<br />

w[3]+= A[3][1] ∗ v[1];<br />

w[3]+= A[3][2] ∗ v[2];<br />

w[3]+= A[3][3] ∗ v[3];<br />

Our tests have shown that such an implementation is really faster than the compiler optimization<br />

on loops. 18<br />

Increasing Concurrency<br />

A disadvantage of the preceeding implementation is that all operations on an entry of the target<br />

vector are per<strong>for</strong>med in one sweep. There<strong>for</strong>e, the second operation must wait <strong>for</strong> the first the<br />

third <strong>for</strong> the second on so on. The fifth operation can be done in parallel with the <strong>for</strong>th,<br />

the ninth with the eighth but this is not satisfying. We like to have more concurrency in our<br />

program that enables parallel pipelines in superscalar processors. Again, we can twiddle our<br />

thumbs and hope that the compiler will reorder the statements or take it in our hands. More<br />

concurrency is provided by the following operation sequence:<br />

w[0]= A[0][0] ∗ v[0];<br />

w[1]= A[1][0] ∗ v[0];<br />

w[2]= A[2][0] ∗ v[0];<br />

w[3]= A[3][0] ∗ v[0];<br />

w[0]+= A[0][1] ∗ v[1];<br />

18 TODO: Give numbers


5.4. META-TUNING: WRITE YOUR OWN COMPILER OPTIMIZATION 163<br />

w[1]+= A[1][1] ∗ v[1];<br />

w[2]+= A[2][1] ∗ v[1];<br />

w[3]+= A[3][1] ∗ v[1];<br />

w[0]+= A[0][2] ∗ v[2];<br />

w[1]+= A[1][2] ∗ v[2];<br />

w[2]+= A[2][2] ∗ v[2];<br />

w[3]+= A[3][2] ∗ v[2];<br />

w[0]+= A[0][3] ∗ v[3];<br />

w[1]+= A[1][3] ∗ v[3];<br />

w[2]+= A[2][3] ∗ v[3];<br />

w[3]+= A[3][3] ∗ v[3];<br />

We only need to reorganize our functor. The general template reads now:<br />

template <br />

struct fsize mat vec mult cm<br />

{<br />

template <br />

void operator()(const Matrix& A, const VecIn& v in, VecOut& v out)<br />

{<br />

fsize mat vec mult cm()(A, v in, v out);<br />

v out[Rows]+= A[Rows][Cols] ∗ v in[Cols];<br />

}<br />

};<br />

Now, we need a partial specialization <strong>for</strong> row 0 to go the next column:<br />

template <br />

struct fsize mat vec mult cm<br />

{<br />

template <br />

void operator()(const Matrix& A, const VecIn& v in, VecOut& v out)<br />

{<br />

fsize mat vec mult cm()(A, v in, v out);<br />

v out[0]+= A[0][Cols] ∗ v in[Cols];<br />

}<br />

};<br />

The partial specialization <strong>for</strong> column 0 is also needed to initialize the entry of the output vector:<br />

template <br />

struct fsize mat vec mult cm<br />

{<br />

template <br />

void operator()(const Matrix& A, const VecIn& v in, VecOut& v out)<br />

{<br />

fsize mat vec mult cm()(A, v in, v out);<br />

v out[Rows]= A[Rows][0] ∗ v in[0];<br />

}<br />

};<br />

Finally, we still need a specialization <strong>for</strong> row and column 0 to terminate the recursion. This<br />

can be reused from the previous functor:<br />

template <br />

struct fsize mat vec mult cm<br />

: fsize mat vec mult {};


164 CHAPTER 5. META-PROGRAMMING<br />

Using Registers<br />

Another feature of modern processors one should keep in mind: cache coherency. Processors<br />

are nowadays designed to share memory while pertaining consistency in their caches. As a<br />

result, every time we write into data structure in memory like our vector w a cache invalidation<br />

signal is sent on the bus. Even if no other processor is present. Un<strong>for</strong>tunately, this slows down<br />

computation perceivably (from our experience).<br />

Fortunately, this can be avoided in many cases in a rather simple way by introducing a temporary<br />

in a function that resides in register(s) if the type allows. We can rely on the compiler to<br />

decide reasonably the location of temporaries.<br />

This implementation requires two classes: one <strong>for</strong> the outer and one <strong>for</strong> the inner loop. Let us<br />

start with the outer loop:<br />

1 template <br />

2 struct fsize mat vec mult reg<br />

3 {<br />

4 template <br />

5 void operator()(const Matrix& A, const VecIn& v in, VecOut& v out)<br />

6 {<br />

7 fsize mat vec mult reg()(A, v in, v out);<br />

8<br />

9 typename VecOut::value type tmp;<br />

10 fsize mat vec mult aux()(A, v in, tmp);<br />

11 v out[Rows]= tmp;<br />

12 }<br />

13 };<br />

We assume that fsize mat vec mult aux is defined or declared be<strong>for</strong>e this class. The first statement<br />

in line 7 calls the computations on the preceeding rows. A temporary is defined in line 9 with<br />

the hope that it will be located in a register. Then we call the computation within this row. The<br />

temporary is passed as reference to an inline function so that the summation will be per<strong>for</strong>med<br />

in a register. In line 10 we write the result back to v out. This still causes the invalidation signal<br />

on the bus but only once <strong>for</strong> each entry.<br />

The functor must be specialized <strong>for</strong> row 0 to avoid infinite loops:<br />

template <br />

struct fsize mat vec mult reg<br />

{<br />

template <br />

void operator()(const Matrix& A, const VecIn& v in, VecOut& v out)<br />

{<br />

typename VecOut::value type tmp;<br />

fsize mat vec mult aux()(A, v in, tmp);<br />

v out[0]= tmp;<br />

}<br />

};<br />

Within each row we iterate over the columns and increment the temporary (in the register<br />

hopefully):<br />

template <br />

struct fsize mat vec mult aux


5.4. META-TUNING: WRITE YOUR OWN COMPILER OPTIMIZATION 165<br />

{<br />

};<br />

template <br />

void operator()(const Matrix& A, const VecIn& v in, ScalOut& tmp)<br />

{<br />

fsize mat vec mult aux()(A, v in, tmp);<br />

tmp+= A[Rows][Cols] ∗ v in[Cols];<br />

}<br />

To terminate the computation in the column we write a specialization.<br />

template <br />

struct fsize mat vec mult aux<br />

{<br />

template <br />

void operator()(const Matrix& A, const VecIn& v in, ScalOut& tmp)<br />

{<br />

tmp= A[Rows][0] ∗ v in[0];<br />

}<br />

};<br />

In this section we showed different ways to optimize a two-dimensional loop (with fixed sizes).<br />

There are certainely more possibilities: <strong>for</strong> instance, we could try to implement it in a way that<br />

uses registers but with the same concurrency as in the second-last implementation. Another<br />

<strong>for</strong>m of optimization could be to agglomerate the write-backs so that multiple invalidation<br />

signals are sent at a time and maybe behave less interruptive.<br />

5.4.3 Dynamic Unrolling – Warm up<br />

⇒ vector unroll example.cpp<br />

As important as the fixed-size optimization is, acceleration <strong>for</strong> dynamically sized containers is<br />

needed even more. We start here with a simple example and some observations. We will reuse<br />

the vector class from Listing 4.1. To show the implementation more clearly, we write the code<br />

without operators and expression templates. Our test case will compute<br />

u = 3v + w<br />

<strong>for</strong> three short vectors of size 1000. The wall clock time will be measured with boost::timer. 19<br />

The vectors v and w will be initialized and to have the data ready to use (i.e. the vectors are<br />

definitively in cache 20 ) we run few additional operations without timing:<br />

#include <br />

#include <br />

// ...<br />

int main()<br />

{<br />

unsigned s= 1000;<br />

if (argc > 1) s= atoi(argv[1]); // read (potentially) from command line<br />

19 See http://www.boost.org/doc/libs/1_43_0/libs/timer/timer.htm<br />

20 TODO: shouldn’t the initialization make this sure? Do we have a better explanation? Reference to benchmark<br />

literature? Do we really need a bullet proof justification here?


166 CHAPTER 5. META-PROGRAMMING<br />

}<br />

vector u(s), v(s), w(s);<br />

vector u(s), v(s), w(s);<br />

<strong>for</strong> (unsigned i= 0; i < s; i++) {<br />

v[i]= float(i);<br />

w[i]= float(2∗i + 15);<br />

}<br />

<strong>for</strong> (unsigned j= 0; j < 3; j++)<br />

<strong>for</strong> (unsigned i= 0; i < s; i++)<br />

u[i]= 3.0f ∗ v[i] + w[i];<br />

const unsigned rep= 200000;<br />

boost::timer native;<br />

<strong>for</strong> (unsigned j= 0; j < rep; j++)<br />

<strong>for</strong> (unsigned i= 0; i < s; i++)<br />

u[i]= 3.0f ∗ v[i] + w[i];<br />

std::cout ≪ ”Compute time native loop is ” ≪ 1000000.0 ∗ native.elapsed() / double(rep) ≪ ” µs.\n”;<br />

return 0 ;<br />

Alternatively we compute this with an unrolling of 4 cycles:<br />

<strong>for</strong> (unsigned j= 0; j < rep; j++)<br />

<strong>for</strong> (unsigned i= 0; i < s; i+= 4) {<br />

u[i]= 3.0f ∗ v[i] + w[i];<br />

u[i+1]= 3.0f ∗ v[i+1] + w[i+1];<br />

u[i+2]= 3.0f ∗ v[i+2] + w[i+2];<br />

u[i+3]= 3.0f ∗ v[i+3] + w[i+3];<br />

}<br />

This code will obviously only work if the vector size is divisible by 4. To avoid errors we can<br />

add an assertion on the vector size but this is not really satisfying. Instead, we generalize this<br />

implementation to arbitrary vector sizes:<br />

boost::timer unrolled;<br />

<strong>for</strong> (unsigned j= 0; j < rep; j++) {<br />

unsigned sb= s / 4 ∗ 4;<br />

<strong>for</strong> (unsigned i= 0; i < sb; i+= 4) {<br />

u[i]= 3.0f ∗ v[i] + w[i];<br />

u[i+1]= 3.0f ∗ v[i+1] + w[i+1];<br />

u[i+2]= 3.0f ∗ v[i+2] + w[i+2];<br />

u[i+3]= 3.0f ∗ v[i+3] + w[i+3];<br />

}<br />

<strong>for</strong> (unsigned i= sb; i < s; i++)<br />

u[i]= 3.0f ∗ v[i] + w[i];<br />

}<br />

std::cout ≪ ”Compute time unrolled loop is ” ≪ 1000000.0 ∗ unrolled.elapsed() / double(rep) ≪ ” µs.\n”;<br />

std::cout ≪ ”u is ” ≪ u ≪ ’\n’;<br />

Listing 5.4: Unrolled computation of u = 3v + w<br />

The little program was compiled with g++ 4.1.2 with the flags -O3 -ffast-math -DNDEBUG


5.4. META-TUNING: WRITE YOUR OWN COMPILER OPTIMIZATION 167<br />

and resulted on the test computer 21 in:<br />

Compute time native loop is 2.64 µs.<br />

Compute time unrolled loop is 1.15 µs.<br />

Alternatively to our hand-coded unrolling we can use the compiler flag -funroll-loops. This<br />

results in the following execution time on the test machine:<br />

Compute time native loop is 2.51 µs.<br />

Compute time unrolled loop is 1.22 µs.<br />

The original loop became slightly faster while our optimized version slowed down a bit. An<br />

entirely different behavior we see if we replace the size s by a constant:<br />

const unsigned s= 1000;<br />

In this case the compiler knows the size of the loops and it might be easier to trans<strong>for</strong>m the<br />

loop or to determine that a trans<strong>for</strong>mation is beneficial.<br />

Compute time native loop is 1.6 µs.<br />

Compute time unrolled loop is 1.55 µs.<br />

Now the native loop is clearly accelerated by the compiler optimization. Why our hand-written<br />

unrolling is slower than be<strong>for</strong>e is not clear. Apparently, the manual and the automatic optimization<br />

got into conflict or the latter overrode the first.<br />

Discussion 5.2 Software tuning and benchmarking is an art of its own with complex compiler<br />

optimization. The tiniest modification in the source can change the run-time behavior of an<br />

examined computation. In the example it should not have mattered whether the size is known<br />

at compile time or not. But it did. Especially when the code is compiled without -DNDEBUG<br />

the compiler might omit the index check in some situations and per<strong>for</strong>m it on others. It is<br />

also important to print out computed values (and filter them out with grep or such) because<br />

the compiler might omit an entire computation when it is obvious that the result is not needed.<br />

Such optimization happen in particular if the results are intrinsic types while computations on<br />

user-defined types are usually subject to such omissions (but one should not count on it).<br />

The goal of this section is not to determine why which code is how much faster than another<br />

one. Besides, each compiler has a different sensitivity on sizes and flags so that we would need a<br />

different line of argumentation and calculation <strong>for</strong> each of them. The only conclusion we like to<br />

draw from these observations is that despite all the progress in compiler technology, we cannot<br />

rely blindly on it and still need hand-tuned implementations and careful benchmarking when<br />

maximal per<strong>for</strong>mance is needed. On the other hand, program snippets as in the last listing shall<br />

not appear in scientific applications <strong>for</strong> the sake of readability, maintainability, portability, . . .<br />

Another question we have not raised so far is: What is the optimal block size <strong>for</strong> the<br />

unrolling?<br />

• Does it depend on the expression?<br />

• Does it depend on the types of the arguments?<br />

• Does it depend on the computer architecture?<br />

21 Phenom II X2 545 3.0 GHz, 3600 MHz PSB, 7MB total cache, Sockel AM2,2x 2GB DDR2-800


168 CHAPTER 5. META-PROGRAMMING<br />

The answer is yes. All of them. The main reason (but not the only one) is that different<br />

processors have different numbers of registers. How many registers are needed in one iteration<br />

depends on the expression and on the types (a complex value needs more registers than a float).<br />

In the following section we will address both issues: how to encapsulate the trans<strong>for</strong>mation<br />

so that it does not show up in the application and how we can change the block size without<br />

rewritten the loop.<br />

5.4.4 Unrolling Vector Expressions<br />

For easier understanding, we discuss the abstraction in meta-tuning step by step. We start with<br />

the previous loop and implement a function <strong>for</strong> it. Say the function’s name is my axpy and it<br />

has a template argument <strong>for</strong> the block size so that we can write <strong>for</strong> instance<br />

<strong>for</strong> (unsigned j= 0; j < rep; j++)<br />

my axpy(u, v, w);<br />

This function shall contain a main loop in unrolled manner with customizable size and a clean-up<br />

loop at the end:<br />

template <br />

void my axpy(U& u, const V& v, const W& w)<br />

{<br />

assert(u.size() == v.size() && v.size() == w.size());<br />

unsigned s= u.size(), sb= s / BSize ∗ BSize;<br />

}<br />

<strong>for</strong> (unsigned i= 0; i < sb; i+= BSize)<br />

my axpy ftor()(u, v, w, i);<br />

<strong>for</strong> (unsigned i= sb; i < s; i++)<br />

u[i]= 3.0f ∗ v[i] + w[i];<br />

As mentioned be<strong>for</strong>e, deduced template types, as the vector types in our case, must be defined<br />

at the end and the explicitly given arguments, in our case the block size, must be at the<br />

beginning of the template arguments. The implementation of the block statement in the first<br />

loop can be implemented similarly to the functor in Section 5.4.1. We deviate a bit from this<br />

implementation by using two template arguments where the <strong>for</strong>mer is increased until it is equal<br />

to the second. It appeared that this approach yielded faster binaries on gcc than using only<br />

one argument and counting it down to zero. 22 In addition, the two-argument version is more<br />

consistent with the multi-dimensional implementation in Section ??. As <strong>for</strong> fixed-size unrolling<br />

we need a recursive template definition. Within the operators, a single statement is per<strong>for</strong>med<br />

and the following statements are called:<br />

template <br />

struct my axpy ftor<br />

{<br />

template <br />

void operator()(U& u, const V& v, const W& w, unsigned i)<br />

{<br />

u[i+Offset]= 3.0f ∗ v[i+Offset] + w[i+Offset];<br />

22 TODO: exercise <strong>for</strong> it


5.4. META-TUNING: WRITE YOUR OWN COMPILER OPTIMIZATION 169<br />

};<br />

}<br />

my axpy ftor()(u, v, w, i);<br />

The only difference to fixed-size unrolling is that the indices are relative to an argument —<br />

here i. The operator() is first called with Offset equal to 0, then with 1, 2, . . . Since each call is<br />

inlined the functor call results in one monolithic block of operations without loop control and<br />

function call. Thus, the call of my axpy ftor()(u, v, w, i) per<strong>for</strong>ms the same operations as<br />

one iteration of the first loop in Listing 5.4.<br />

Of course this compilation would end in an infinite loop if we <strong>for</strong>get to specialize it <strong>for</strong> Max:<br />

template <br />

struct my axpy ftor<br />

{<br />

template <br />

void operator()(U& u, const V& v, const W& w, unsigned i) {}<br />

};<br />

Per<strong>for</strong>ming the considered vector operation with different unrollings yields<br />

Compute time unrolled loop is 1.44 µs.<br />

Compute time unrolled loop is 1.15 µs.<br />

Compute time unrolled loop is 1.15 µs.<br />

Compute time unrolled loop is 1.14 µs.<br />

Now we can call this operation <strong>for</strong> any block size we like. On the other hand, it is rather<br />

cumbersome to implement the according functions and functors <strong>for</strong> each vector expression.<br />

There<strong>for</strong>e, we combine this technique now with expression templates.<br />

5.4.5 Tuning an Expression Template<br />

⇒ vector unroll example2.cpp<br />

Let us recall Section 5.3.3. So far, we developed a vector class with expression templates <strong>for</strong><br />

vector sums. In the same manner we can implement the product of a scalar and a vector but<br />

we leave this as exercise and consider expressions with addition only, <strong>for</strong> example:<br />

u = v + v + w<br />

Now we frame this vector operation with a repeting loop and the time measure:<br />

boost::timer t;<br />

<strong>for</strong> (unsigned j= 0; j < rep; j++)<br />

u= v + v + w;<br />

std::cout ≪ ”Compute time is ” ≪ 1000000.0 ∗ t.elapsed() / double(rep) ≪ ” µs.\n”;<br />

This results in:<br />

Compute time is 1.72 µs.<br />

To incorporate meta-tuning into expression templates we only need to modify the actual assignment<br />

because only here a loop is per<strong>for</strong>med. All the other operations (well so far we have


170 CHAPTER 5. META-PROGRAMMING<br />

only a sum but in theory it could be tons of them) only return objects with references. The<br />

loop operator= is split into the unrolled at the beginning and the one-by-one completion at the<br />

end:<br />

template <br />

class vector<br />

{<br />

template <br />

vector& operator=(const Src& that)<br />

{<br />

check size(size(that));<br />

unsigned s= my size, sb= s / 4 ∗ 4;<br />

};<br />

}<br />

<strong>for</strong> (unsigned i= 0; i < sb; i+= 4)<br />

assign()(∗this, that, i);<br />

<strong>for</strong> (unsigned i= sb; i < s; i++)<br />

data[i]= that[i];<br />

return ∗this;<br />

The assign functor is realized analogous to my axpy ftor:<br />

template <br />

struct assign<br />

{<br />

template <br />

void operator()(U& u, const V& v, unsigned i)<br />

{<br />

u[i+Offset]= v[i+Offset];<br />

assign()(u, v, i);<br />

}<br />

};<br />

template <br />

struct assign<br />

{<br />

template <br />

void operator()(U& u, const V& v, unsigned i) {}<br />

};<br />

Computing the expression above we yield:<br />

Compute time is 1.37 µs.<br />

With this rather simple modification we now accelerated ALL vector expression templates.<br />

In comparison with the previous implementation we lost however the flexibility to costumize<br />

the loop unrolling. The functor assign has two arguments thus allowing <strong>for</strong> customization. The<br />

problem is the assignment operator. In principle we can define an explicit template argument<br />

there:<br />

template <br />

vector& operator=(const Src& that)<br />

{


5.4. META-TUNING: WRITE YOUR OWN COMPILER OPTIMIZATION 171<br />

}<br />

check size(size(that));<br />

unsigned s= my size, sb= s / BSize ∗ BSize;<br />

<strong>for</strong> (unsigned i= 0; i < sb; i+= BSize)<br />

assign()(∗this, that, i);<br />

<strong>for</strong> (unsigned i= sb; i < s; i++)<br />

data[i]= that[i];<br />

return ∗this;<br />

The drawback is that we cannot use the symbol ‘=’ naturally as infix operator but must write:<br />

u.operator=(v + v + w);<br />

This has in fact a certain geeky charm and one could also argue that people did (and still do)<br />

more painful things <strong>for</strong> per<strong>for</strong>mance. Nonetheless, it does not meet our ideals of intuitiveness<br />

and readability.<br />

Alternative notations are:<br />

or<br />

unroll(u= v + v + w);<br />

unroll(u)= v + v + w;<br />

Both version are implementable and provide comparable intuitiveness. The <strong>for</strong>mer expresses<br />

more correctly what we are doing while the latter is easier to implement and the structure of the<br />

computed expression remains better visibility. There<strong>for</strong>e we show the realization of the second<br />

<strong>for</strong>m.<br />

The function unroll is simple to implement: it just returns an object with a reference to the<br />

vector and a type in<strong>for</strong>mation <strong>for</strong> the unroll size:<br />

template <br />

unroll vector inline unroll(Vector& v)<br />

{<br />

return unroll vector(v);<br />

}<br />

The class unroll vector is not complicated either. It only needs to take a reference to the target<br />

vector and an assignment operator:<br />

template <br />

class unroll vector<br />

{<br />

public:<br />

unroll vector(V& ref) : ref(ref) {}<br />

template <br />

V& operator=(const Src& that)<br />

{<br />

assert(size(ref) == size(that));<br />

unsigned s= size(ref), sb= s / BSize ∗ BSize;


172 CHAPTER 5. META-PROGRAMMING<br />

<strong>for</strong> (unsigned i= 0; i < sb; i+= BSize)<br />

assign()(ref, that, i);<br />

<strong>for</strong> (unsigned i= sb; i < s; i++)<br />

ref[i]= that[i];<br />

return ref;<br />

}<br />

private:<br />

V& ref;<br />

};<br />

Evaluting the considered vector expressions <strong>for</strong> some block sizes yields:<br />

Compute time unroll(u)= v + v + w is 1.72 µs.<br />

Compute time unroll(u)= v + v + w is 1.52 µs.<br />

Compute time unroll(u)= v + v + w is 1.36 µs.<br />

Compute time unroll(u)= v + v + w is 1.37 µs.<br />

Compute time unroll(u)= v + v + w is 1.4 µs.<br />

This few benchmarks are consistent with the previous results, i.e. unroll is equal to the<br />

canocical implementation and unroll is as fast as the hard-wired unrolling.<br />

5.4.6 Tuning Reduction Operations<br />

Reducing on a Single Variable<br />

⇒ reduction unroll example.cpp<br />

In the preceding vector operations, the i th entry of each vector was handled independently of<br />

any other entry. For reduction operations, they are related by one or more temporary variables.<br />

And this temporary variable(s) can become a serious bottle neck.<br />

First, we test if a reduction operation, say the discrete L1 norm (also known as Manhattan<br />

norm) can be accelerated by the techniques from Section 5.4.4. We implement the one norm<br />

function in terms of a functor <strong>for</strong> the iteration block:<br />

template <br />

typename Vector::value type<br />

inline one norm(const Vector& v)<br />

{<br />

using std::abs;<br />

typename Vector::value type sum(0);<br />

unsigned s= size(v), sb= s / BSize ∗ BSize;<br />

}<br />

<strong>for</strong> (unsigned i= 0; i < sb; i+= BSize)<br />

one norm ftor()(sum, v, i);<br />

<strong>for</strong> (unsigned i= sb; i < s; i++)<br />

sum+= abs(v[i]);<br />

return sum;


5.4. META-TUNING: WRITE YOUR OWN COMPILER OPTIMIZATION 173<br />

The functor is also implemented in the same manner as be<strong>for</strong>e:<br />

template <br />

struct one norm ftor<br />

{<br />

template <br />

void operator()(S& sum, const V& v, unsigned i)<br />

{<br />

using std::abs;<br />

sum+= abs(v[i+Offset]);<br />

one norm ftor()(sum, v, i);<br />

}<br />

};<br />

template <br />

struct one norm ftor<br />

{<br />

template <br />

void operator()(S& sum, const V& v, unsigned i) {}<br />

};<br />

The measured run-time behavior behavior is:<br />

Compute time one_norm(v) is 7.42 µs.<br />

Compute time one_norm(v) is 3.64 µs.<br />

Compute time one_norm(v) is 1.9 µs.<br />

Compute time one_norm(v) is 1.25 µs.<br />

Compute time one_norm(v) is 1.03 µs.<br />

This is already a good improvement but maybe we can do better. 23<br />

Reducing on an Array<br />

⇒ reduction unroll array example.cpp<br />

When we look at the previous computation, we see that a different entry of v is used in each<br />

iteration. But every computation accesses the same temporary variable sum and this limits<br />

concurrency. To provide more concurrency, we can use multiple temporaries 24 in an array <strong>for</strong><br />

instance. The modified function reads then:<br />

template <br />

typename Vector::value type<br />

inline one norm(const Vector& v)<br />

{<br />

using std::abs;<br />

typename Vector::value type sum[BSize];<br />

<strong>for</strong> (unsigned i= 0; i < BSize; i++)<br />

sum[i]= 0;<br />

23 TODO: Test it with gcc 3.4 and MSVC. Speed up in table<br />

24 Strictly speaking, this is not true <strong>for</strong> every possible scalar type we can think of. The addition of the sum type<br />

must be a commutative monoid because we change the evaluation order. This holds of course <strong>for</strong> all intrinsic<br />

numeric types and certainly <strong>for</strong> almost all user-defined arithmetic types. But one is free to define an addition<br />

that is not commutative or not monoidal. In this case our trans<strong>for</strong>mation would be wrong. To deal with such<br />

exceptions we need semantic concepts which hopefully become part of C ++ in the next years.


174 CHAPTER 5. META-PROGRAMMING<br />

}<br />

unsigned s= size(v), sb= s / BSize ∗ BSize;<br />

<strong>for</strong> (unsigned i= 0; i < sb; i+= BSize)<br />

one norm ftor()(sum, v, i);<br />

<strong>for</strong> (unsigned i= 1; i < BSize; i++)<br />

sum[0]+= sum[i];<br />

<strong>for</strong> (unsigned i= sb; i < s; i++)<br />

sum[0]+= abs(v[i]);<br />

return sum[0];<br />

The according functor must refer the right element in the sum array:<br />

template <br />

struct one norm ftor<br />

{<br />

template <br />

void operator()(S∗ sum, const V& v, unsigned i)<br />

{<br />

using std::abs;<br />

sum[Offset]+= abs(v[i+Offset]);<br />

one norm ftor()(sum, v, i);<br />

}<br />

};<br />

template <br />

struct one norm ftor<br />

{<br />

template <br />

void operator()(S∗ sum, const V& v, unsigned i) {}<br />

};<br />

On the test machine this took:<br />

Compute time one_norm(v) is 7.33 µs.<br />

Compute time one_norm(v) is 5.15 µs.<br />

Compute time one_norm(v) is 2 µs.<br />

Compute time one_norm(v) is 1.4 µs.<br />

Compute time one_norm(v) is 1.16 µs.<br />

This is even a bit slower than the version with one variable. Maybe an array is more expensive<br />

to pass as argument even in an inline function. Let us try something else.<br />

Reducing on a Nested Class Object<br />

⇒ reduction unroll nesting example.cpp<br />

To avoid arrays, we can define a class <strong>for</strong> n temporary variables where n is a template argument.<br />

Such a class is designed more consistently with the recursive scheme of the functors:<br />

template <br />

struct multi tmp


5.4. META-TUNING: WRITE YOUR OWN COMPILER OPTIMIZATION 175<br />

{<br />

};<br />

typedef multi tmp sub type;<br />

multi tmp(const Value& v) : value(v), sub(v) {}<br />

Value value;<br />

sub type sub;<br />

template <br />

struct multi tmp<br />

{<br />

multi tmp(const Value& v) {}<br />

};<br />

An object of this type can be recursively initialized so that we do not need a loop as <strong>for</strong> the<br />

array. A functor can operate on the value member and pass a reference to the sub member to<br />

its successor. This leads us to the implementation of our functor:<br />

template <br />

struct one norm ftor<br />

{<br />

template <br />

void operator()(S& sum, const V& v, unsigned i)<br />

{<br />

using std::abs;<br />

sum.value+= abs(v[i+Offset]);<br />

one norm ftor()(sum.sub, v, i);<br />

}<br />

};<br />

template <br />

struct one norm ftor<br />

{<br />

template <br />

void operator()(S& sum, const V& v, unsigned i) {}<br />

};<br />

The unrolled function that uses this functor reads:<br />

template <br />

typename Vector::value type<br />

inline one norm(const Vector& v)<br />

{<br />

using std::abs;<br />

typedef typename Vector::value type value type;<br />

multi tmp multi sum(0);<br />

unsigned s= size(v), sb= s / BSize ∗ BSize;<br />

<strong>for</strong> (unsigned i= 0; i < sb; i+= BSize)<br />

one norm ftor()(multi sum, v, i);<br />

value type sum= multi sum.sum();<br />

<strong>for</strong> (unsigned i= sb; i < s; i++)<br />

sum+= abs(v[i]);


176 CHAPTER 5. META-PROGRAMMING<br />

}<br />

return sum;<br />

There is one piece still missing. We need to reduce the partial sums in multi sum. Un<strong>for</strong>tunately<br />

we cannot write a loop over the members of multi sum. So, we need a recursive function that<br />

dives down into multi sum. This would be a bit cumbersome as free function, especially as we<br />

try to avoid partial specialization of template. As a member function, it is much easier and the<br />

specialization happens more safely on the class level:<br />

template <br />

struct multi tmp<br />

{<br />

Value sum() const { return value + sub.sum(); }<br />

};<br />

template <br />

struct multi tmp<br />

{<br />

Value sum() const { return 0; }<br />

};<br />

Note that we started the summation with 0 not the innermost value member. We could do this<br />

but then we need another specialization <strong>for</strong> multi tmp. Likewise we can implement a<br />

general reduction but we need as in std::accumulate an initial element:<br />

template <br />

struct multi tmp<br />

{<br />

template <br />

Value reduce(Op op, const Value& init) const { return op(value, sub.reduce(op, init)); }<br />

};<br />

template <br />

struct multi tmp<br />

{<br />

template <br />

Value reduce(Op, const Value& init) const { return init; }<br />

};<br />

The compute time of this version is:<br />

Compute time one_norm(v) is 7.47 µs.<br />

Compute time one_norm(v) is 1.14 µs.<br />

Compute time one_norm(v) is 0.71 µs.<br />

Compute time one_norm(v) is 0.75 µs.<br />

Compute time one_norm(v) is 1.01 µs.<br />

Pushing Temporaries into Registers<br />

⇒ reduction unroll registers example.cpp


5.4. META-TUNING: WRITE YOUR OWN COMPILER OPTIMIZATION 177<br />

Earlier experiments with older compilers (gcc 3.4) 25 exposed a serious overhead <strong>for</strong> using arrays<br />

or nested classes; it was finally even slower then using one single variable. The reason was<br />

probably that the compiler could not use registers <strong>for</strong> these types. 26<br />

The most likely way to store temporaries in registers is to declare them as separate variables:<br />

inline one norm(const Vector& v)<br />

{<br />

typename Vector::value type s0(0), s1(0), s2(0), ...<br />

}<br />

As one can see, the problem is how many one declares. The number cannot depend on the<br />

template argument but must be fix <strong>for</strong> all sizes (unless one writes a different implementation<br />

<strong>for</strong> each number and undermines the expressiveness of templates). Thus, we have to fix a certain<br />

number of variables — say 8. Then, we cannot unroll it more than eight times.<br />

The next issue we run into is the number of function arguments. When we call the iteration<br />

block we pass all variables (registers):<br />

<strong>for</strong> (unsigned i= 0; i < sb; i+= BSize)<br />

one norm ftor()(s0, s1, s2, s3, s4, s5, s6, s7, v, i);<br />

The first calculation in such a block is per<strong>for</strong>med on s0 and s1–s2 are only passed to the functors<br />

<strong>for</strong> the following computations. After this, the second computation must accumulate on the<br />

second function argument, the third calculation on the third argument, . . . This is un<strong>for</strong>tunately<br />

not implementable with templates (only with very ugly and highly error-prone source code<br />

manipulations by macros).<br />

Alternatively, each computation is per<strong>for</strong>med on its first function argument and subsequent<br />

functors are called with omitted first argument:<br />

one norm ftor()(s1, s2, s3, s4, s5, s6, s7, v, i);<br />

one norm ftor()(s2, s3, s4, s5, s6, s7, v, i);<br />

one norm ftor()(s3, s4, s5, s6, s7, v, i);<br />

This is neither realizable with templates.<br />

The solution is to rotate the references to registers:<br />

one norm ftor()(s1, s2, s3, s4, s5, s6, s7, s0, v, i);<br />

one norm ftor()(s2, s3, s4, s5, s6, s7, s0, s1, v, i);<br />

one norm ftor()(s3, s4, s5, s6, s7, s0, s1, s2, v, i);<br />

This rotation is achieved by the following functor implementation:<br />

template <br />

struct one norm ftor<br />

{<br />

template <br />

void operator()(S& s0, S& s1, S& s2, S& s3, S& s4, S& s5, S& s6, S& s7, const V& v, unsigned i)<br />

{<br />

using std::abs;<br />

s0+= abs(v[i+Offset]);<br />

one norm ftor()(s1, s2, s3, s4, s5, s6, s7, s0, v, i);<br />

25 TODO: Show!!!<br />

26 TODO: which raises the question why they can do it today


178 CHAPTER 5. META-PROGRAMMING<br />

};<br />

}<br />

template <br />

struct one norm ftor<br />

{<br />

template <br />

void operator()(S& s0, S& s1, S& s2, S& s3, S& s4, S& s5, S& s6, S& s7, const V& v, unsigned i) {}<br />

};<br />

The according one norm function based on this functor is straight<strong>for</strong>ward:<br />

template <br />

typename Vector::value type<br />

inline one norm(const Vector& v)<br />

{<br />

using std::abs;<br />

typename Vector::value type s0(0), s1(0), s2(0), s3(0), s4(0), s5(0), s6(0), s7(0);<br />

unsigned s= size(v), sb= s / BSize ∗ BSize;<br />

}<br />

<strong>for</strong> (unsigned i= 0; i < sb; i+= BSize)<br />

one norm ftor()(s0, s1, s2, s3, s4, s5, s6, s7, v, i);<br />

s0+= s1 + s2 + s3 + s4 + s5 + s6 + s7;<br />

<strong>for</strong> (unsigned i= sb; i < s; i++)<br />

s0+= abs(v[i]);<br />

return s0;<br />

A slight disadvantage is that all registers must be accumulated after the first iteration no matter<br />

how small BSize is and how short the vector. A great advantage of the rotation is that BSize<br />

is not limited to the number of temporary variables in such accumulations. If BSize is larger<br />

then some or all variables are used multiple times without corrupting the result. The number<br />

of temporaries is nonetheless a limiting factor <strong>for</strong> the concurrency.<br />

The execution of this implementation durates on the test machine:<br />

Compute time one_norm(v) is 6.77 µs.<br />

Compute time one_norm(v) is 1.13 µs.<br />

Compute time one_norm(v) is 0.71 µs.<br />

Compute time one_norm(v) is 0.75 µs.<br />

Compute time one_norm(v) is 1.07 µs.<br />

This is comparable with the nested class (in this environment).<br />

Résumé on Reduction Tuning<br />

The goal of this section was not to determine the ultimately tuned reduction implementation<br />

<strong>for</strong> superscalar processors. 27 The main ambition of this section, in fact of the whole book, is to<br />

demonstrate the diversity of implementation opportunities. With the enormous expressiveness<br />

27 In the presence of the new GPU cards with hundreds of cores and millions of threads, the fight <strong>for</strong> this little<br />

concurrency is not so impressive. Nonetheless, we will still need per<strong>for</strong>mance tuning on single-core and “few-core”


5.4. META-TUNING: WRITE YOUR OWN COMPILER OPTIMIZATION 179<br />

of C ++ one can use (or abuse) the compiler to generate the most efficient version without<br />

rewriting the program sources, as one would need in C or Fortran. The power of internal<br />

code generation with the C ++ compiler only makes external code generation as in ATLAS 28<br />

unnecessary. In ATLAS, functions are written in a domain specific language and C programs 29 in<br />

slight variations are generated with a tool and compared regarding per<strong>for</strong>mance. The techniques<br />

presented here empower us to generate binaries equivalent to those variations by just using a<br />

C ++ compiler. Thus, we can tune our programs by changing template arguments or constants<br />

(that might be set plat<strong>for</strong>m-dependently).<br />

5.4.7 Tuning Nested Loops<br />

⇒ matrix unroll example.cpp<br />

The most used (and abused) example in per<strong>for</strong>mance discussions is dense matrix multiplication.<br />

We do not claim to compete with hand-tuned assembler codes but we show the power of metaprogramming<br />

to generate code variations from a single implementation. As starting point we<br />

use a templatized implementation of matrix class from Section 3.7.4.<br />

We begin our implementation with a simple test case:<br />

int main()<br />

{<br />

const unsigned s= 4; // s= 4 <strong>for</strong> testing and 128 <strong>for</strong> timing<br />

matrix A(s, s), B(s, s), C(s, s);<br />

}<br />

<strong>for</strong> (unsigned i= 0; i < s; i++)<br />

<strong>for</strong> (unsigned j= 0; j < s; j++) {<br />

A(i, j)= 100.0 ∗ i + j;<br />

B(i, j)= 200.0 ∗ i + j;<br />

}<br />

mult(A, B, C);<br />

std::cout ≪ ”C is ” ≪ C ≪ ’\n’;<br />

A matrix multiplication is easily implemented with three nested loops. One of the 6 possible<br />

nestings is a dot-product-like calculation of each entry from C:<br />

cik = Ai · B k<br />

where Ai is the i th row of A and Bk the k th column of B. We use a temporary in the innermost<br />

loop to decrease the cache-invalidation overhead of writing to C’s elements in each operation:<br />

template <br />

void inline mult(const Matrix& A, const Matrix& B, Matrix& C)<br />

{<br />

assert(A.num rows() == B.num rows()); // ...<br />

machines at least <strong>for</strong> some years since not everybody has GPU card <strong>for</strong> numerics and not every algorithm is<br />

already successfully ported (e.g. incomplete LU on arbitrary sparse matrices). By the time of this writing their<br />

is not even support <strong>for</strong> std::complex.<br />

28 http://math-atlas.source<strong>for</strong>ge.net/<br />

29 In some cases the C programs contain assembler snippets <strong>for</strong> a given plat<strong>for</strong>m in order to achieve per<strong>for</strong>mance<br />

close to peak.


180 CHAPTER 5. META-PROGRAMMING<br />

}<br />

typedef typename Matrix::value type value type;<br />

unsigned s= A.num rows();<br />

<strong>for</strong> (unsigned i= 0; i < s; i++)<br />

<strong>for</strong> (unsigned k= 0; k < s; k++) {<br />

value type tmp(0);<br />

<strong>for</strong> (unsigned j= 0; j < s; j++)<br />

tmp+= A(i, j) ∗ B(j, k);<br />

C(i, k)= tmp;<br />

}<br />

For this implementation, we write a benchmark function:<br />

template <br />

void bench(const Matrix& A, const Matrix& B, Matrix& C, const unsigned rep)<br />

{<br />

boost::timer t1;<br />

<strong>for</strong> (unsigned j= 0; j < rep; j++)<br />

mult(A, B, C);<br />

double t= t1.elapsed() / double(rep);<br />

unsigned s= A.num rows();<br />

}<br />

std::cout ≪ ”Compute time mult(A, B, C) is ”<br />

≪ 1000000.0 ∗ t ≪ ” µs. This are ”<br />

≪ s ∗ s ∗ (2∗s − 1) / t / 1000000.0 ≪ ” MFlops.\n”;<br />

The run time and per<strong>for</strong>mance of our canonical implementation (with 128 × 128 matrices) is:<br />

Compute time mult(A, B, C) is 5290 µs. This are 789.777 MFlops.<br />

This implementation is our reference regarding per<strong>for</strong>mance and results.<br />

For the development of the unrolled implementation we go back to 4 × 4 matrices. In contrast<br />

to Section 5.4.6 we do not unroll a single reduction but per<strong>for</strong>m multiple reductions in parallel.<br />

That means <strong>for</strong> the three loops to unroll the two outer loops and to replace the body in the<br />

inner loop by multiple operations. The latter we achieve as usual with a functor.<br />

As in the canonical implementation, the reduction shall not be per<strong>for</strong>med in elements of C<br />

but in temporaries. For this purpose we use the class multi tmp from § 5.4.6. For the sake of<br />

simplicity we limit ourselves to matrix sizes that are multiples of the unroll parameters. 30 An<br />

unrolled matrix multiplication is shown in the following code:<br />

template <br />

void inline mult(const Matrix& A, const Matrix& B, Matrix& C)<br />

{<br />

assert(A.num rows() == B.num rows()); // ...<br />

assert(A.num rows() % Size0 == 0); // we omitted cleanup here<br />

assert(A.num cols() % Size1 == 0); // we omitted cleanup here<br />

typedef typename Matrix::value type value type;<br />

unsigned s= A.num rows();<br />

30 A full implementation <strong>for</strong> arbitrary matrix sizes is realized in MTL4.


5.4. META-TUNING: WRITE YOUR OWN COMPILER OPTIMIZATION 181<br />

}<br />

mult block block;<br />

<strong>for</strong> (unsigned i= 0; i < s; i+= Size0)<br />

<strong>for</strong> (unsigned k= 0; k < s; k+= Size1) {<br />

multi tmp tmp(value type(0));<br />

<strong>for</strong> (unsigned j= 0; j < s; j++)<br />

block(tmp, A, B, i, j, k);<br />

block.update(tmp, C, i, k);<br />

}<br />

We still owe the reader the implementation of the functor mult block. The techniques are the<br />

same as in vector operations but we have to deal with more indices and their respective limits:<br />

template <br />

struct mult block<br />

{<br />

typedef mult block next;<br />

template <br />

void operator()(Tmp& tmp, const Matrix& A, const Matrix& B, unsigned i, unsigned j, unsigned k)<br />

{<br />

std::cout ≪ ”tmp.” ≪ tmp.bs ≪ ”+= A[” ≪ i + Index0 ≪ ”][” ≪ j ≪ ”] ∗ B[” ≪ j ≪ ”][” ≪<br />

k + Index1 ≪ ”]\n”;<br />

tmp.value+= A(i + Index0, j) ∗ B(j, k + Index1);<br />

next()(tmp.sub, A, B, i, j, k);<br />

}<br />

};<br />

template <br />

void update(const Tmp& tmp, Matrix& C, unsigned i, unsigned k)<br />

{<br />

std::cout ≪ ”C[” ≪ i + Index0 ≪ ”][” ≪ k + Index1 ≪ ”]= tmp.” ≪ tmp.bs ≪ ”\n”;<br />

C(i + Index0, k + Index1)= tmp.value;<br />

next().update(tmp.sub, C, i, k);<br />

}<br />

template <br />

struct mult block<br />

{<br />

typedef mult block next;<br />

template <br />

void operator()(Tmp& tmp, const Matrix& A, const Matrix& B, unsigned i, unsigned j, unsigned k)<br />

{<br />

std::cout ≪ ”tmp.” ≪ tmp.bs ≪ ”+= A[” ≪ i + Index0 ≪ ”][” ≪ j ≪ ”] ∗ B[” ≪ j ≪ ”][” ≪<br />

k + Max1 ≪ ”]\n”;<br />

tmp.value+= A(i + Index0, j) ∗ B(j, k + Max1);<br />

next()(tmp.sub, A, B, i, j, k);<br />

}<br />

template <br />

void update(const Tmp& tmp, Matrix& C, unsigned i, unsigned k)<br />

{<br />

std::cout ≪ ”C[” ≪ i + Index0 ≪ ”][” ≪ k + Max1 ≪ ”]= tmp.” ≪ tmp.bs ≪ ”\n”;


182 CHAPTER 5. META-PROGRAMMING<br />

};<br />

}<br />

C(i + Index0, k + Max1)= tmp.value;<br />

next().update(tmp.sub, C, i, k);<br />

template <br />

struct mult block<br />

{<br />

template <br />

void operator()(Tmp& tmp, const Matrix& A, const Matrix& B, unsigned i, unsigned j, unsigned k)<br />

{<br />

std::cout ≪ ”tmp.” ≪ tmp.bs ≪ ”+= A[” ≪ i + Max0 ≪ ”][” ≪ j ≪ ”] ∗ B[” ≪ j ≪ ”][” ≪<br />

k + Max1 ≪ ”]\n”;<br />

tmp.value+= A(i + Max0, j) ∗ B(j, k + Max1);<br />

}<br />

};<br />

template <br />

void update(const Tmp& tmp, Matrix& C, unsigned i, unsigned k)<br />

{<br />

std::cout ≪ ”C[” ≪ i + Max0 ≪ ”][” ≪ k + Max1 ≪ ”]= tmp.” ≪ tmp.bs ≪ ”\n”;<br />

C(i + Max0, k + Max1)= tmp.value;<br />

}<br />

In order to verify that all operations are per<strong>for</strong>med, we log them completely but look here only<br />

at tmp.4 and tmp.3:<br />

tmp.4+= A[1][0] * B[0][0]<br />

tmp.3+= A[1][0] * B[0][1]<br />

tmp.4+= A[1][1] * B[1][0]<br />

tmp.3+= A[1][1] * B[1][1]<br />

tmp.4+= A[1][2] * B[2][0]<br />

tmp.3+= A[1][2] * B[2][1]<br />

tmp.4+= A[1][3] * B[3][0]<br />

tmp.3+= A[1][3] * B[3][1]<br />

C[1][0]= tmp.4<br />

C[1][1]= tmp.3<br />

tmp.4+= A[3][0] * B[0][0]<br />

tmp.3+= A[3][0] * B[0][1]<br />

tmp.4+= A[3][1] * B[1][0]<br />

tmp.3+= A[3][1] * B[1][1]<br />

tmp.4+= A[3][2] * B[2][0]<br />

tmp.3+= A[3][2] * B[2][1]<br />

tmp.4+= A[3][3] * B[3][0]<br />

tmp.3+= A[3][3] * B[3][1]<br />

C[3][0]= tmp.4<br />

C[3][1]= tmp.3


5.4. META-TUNING: WRITE YOUR OWN COMPILER OPTIMIZATION 183<br />

This log shows that C[1][0] and C[1][1] are computed alternately so that it can be per<strong>for</strong>med in<br />

parallel on a super-scalar computer. One can also verify that<br />

cik =<br />

3�<br />

j=0<br />

aijbjk.<br />

Printing C will also show the same result as <strong>for</strong> the canonical matrix multiplication.<br />

The implementation above can be simplified. The first functor specialization is only different<br />

to the general functor in the way how the indices are incrememted. We can factor this out with<br />

an additional loop class:<br />

template <br />

struct loop2<br />

{<br />

static const unsigned next index0= Index0, next index1= Index1 + 1;<br />

};<br />

template <br />

struct loop2<br />

{<br />

static const unsigned next index0= Index0 + 1, next index1= 0;<br />

};<br />

Such a general class has a high potential of reuse. With this class we can fuse the funtor<br />

template and the first specialization:<br />

template <br />

struct mult block<br />

{<br />

typedef loop2 l;<br />

typedef mult block next;<br />

template <br />

void operator()(Tmp& tmp, const Matrix& A, const Matrix& B, unsigned i, unsigned j, unsigned k)<br />

{<br />

std::cout ≪ ”tmp.” ≪ tmp.bs ≪ ”+= A[” ≪ i + Index0 ≪ ”][” ≪ j ≪ ”] ∗ B[” ≪ j ≪ ”][” ≪<br />

k + Index1 ≪ ”]\n”;<br />

tmp.value+= A(i + Index0, j) ∗ B(j, k + Index1);<br />

next()(tmp.sub, A, B, i, j, k);<br />

}<br />

};<br />

template <br />

void update(const Tmp& tmp, Matrix& C, unsigned i, unsigned k)<br />

{<br />

std::cout ≪ ”C[” ≪ i + Index0 ≪ ”][” ≪ k + Index1 ≪ ”]= tmp.” ≪ tmp.bs ≪ ”\n”;<br />

C(i + Index0, k + Index1)= tmp.value;<br />

next().update(tmp.sub, C, i, k);<br />

}<br />

The other specialization remains unaltered.<br />

Last but not least we like to see impact of our not-so-simple matrix product. The benchmark<br />

yielded on our test machine:


184 CHAPTER 5. META-PROGRAMMING<br />

Compute time mult(A, B, C) is 5250 µs. This are 795.794 MFlops.<br />

Compute time mult(A, B, C) is 2770 µs. This are 1508.27 MFlops.<br />

Compute time mult(A, B, C) is 1990 µs. This are 2099.46 MFlops.<br />

Compute time mult(A, B, C) is 2230 µs. This are 1873.51 MFlops.<br />

Compute time mult(A, B, C) is 2130 µs. This are 1961.46 MFlops.<br />

Compute time mult(A, B, C) is 2930 µs. This are 1425.91 MFlops.<br />

Compute time mult(A, B, C) is 2350 µs. This are 1777.84 MFlops.<br />

Compute time mult(A, B, C) is 3420 µs. This are 1221.61 MFlops.<br />

Compute time mult(A, B, C) is 4010 µs. This are 1041.88 MFlops.<br />

Compute time mult(A, B, C) is 2870 µs. This are 1455.72 MFlops.<br />

Compute time mult(A, B, C) is 3230 µs. This are 1293.47 MFlops.<br />

Compute time mult(A, B, C) is 3060 µs. This are 1365.33 MFlops.<br />

Compute time mult(A, B, C) is 2780 µs. This are 1502.85 MFlops.<br />

One can see that mult has the same per<strong>for</strong>mance as the original implementation which<br />

in fact is per<strong>for</strong>ming the operations in exactly the same order (so far the compiler optimization<br />

does not change the order internally). We see also that the unrolled versions are all faster, up<br />

to a speed-up of 2.6.<br />

With double matrices the per<strong>for</strong>mance is lower in total:<br />

Compute time mult(A, B, C) is 10080 µs. This are 414.476 MFlops.<br />

Compute time mult(A, B, C) is 8700 µs. This are 480.221 MFlops.<br />

Compute time mult(A, B, C) is 7470 µs. This are 559.293 MFlops.<br />

Compute time mult(A, B, C) is 5910 µs. This are 706.924 MFlops.<br />

Compute time mult(A, B, C) is 3750 µs. This are 1114.11 MFlops.<br />

Compute time mult(A, B, C) is 5140 µs. This are 812.825 MFlops.<br />

Compute time mult(A, B, C) is 3420 µs. This are 1221.61 MFlops.<br />

Compute time mult(A, B, C) is 4590 µs. This are 910.222 MFlops.<br />

Compute time mult(A, B, C) is 4310 µs. This are 969.355 MFlops.<br />

Compute time mult(A, B, C) is 6280 µs. This are 665.274 MFlops.<br />

Compute time mult(A, B, C) is 5310 µs. This are 786.802 MFlops.<br />

Compute time mult(A, B, C) is 4290 µs. This are 973.874 MFlops.<br />

Compute time mult(A, B, C) is 3490 µs. This are 1197.11 MFlops.<br />

It shows that other parametrizations yield more acceleration and that the per<strong>for</strong>mance could<br />

almost be tripled.<br />

Which configuration is best and why is — as mentioned be<strong>for</strong>e — not topic of this script; we<br />

only show programming techniques. The reader is invited to try this program on his/her own<br />

computer. The technique in this section is intended <strong>for</strong> L1 cache usage. If matrices are larger,<br />

one should use more levels of blocking. A general-purpose methodology <strong>for</strong> locality on L2, L3,<br />

main memory, local disk, . . . is recursion. This avoids reimplementation <strong>for</strong> each cache size and<br />

per<strong>for</strong>ms even reasonably well in virtually memory, see <strong>for</strong> instance [?].


5.5. EXERCISES 185<br />

5.5 Exercises<br />

5.5.1 Vector class<br />

Revisit the vector example from §??.<br />

Make an expression <strong>for</strong> a scalar times a vector:<br />

class scalar times vector expressions {<br />

} ;<br />

that inherits from base vector. Use the inheritance mechanism to assign scalar times vector expressions<br />

into vector.<br />

5.5.2 Vector expression template<br />

Make a vector concept, which you call Vector. Make a vector class (you can use std::vector)<br />

that satisfies this concept. This vector class should have at least the following members:<br />

class my vector {<br />

public:<br />

typedef double value type ;<br />

public:<br />

my vector( int n ) ;<br />

// Copy Constructor from type itself<br />

my vector( my vector& ) ;<br />

// Constructor from generic vector<br />

template <br />

my vector( Vector& ) ;<br />

// Assignment operator<br />

my vector& operator=( my vector const& v ) ;<br />

// Assignment <strong>for</strong> generic Vector<br />

template <br />

my vector& operator=( Vector const& v ) ;<br />

value type& operator() ( int i ) ;<br />

public: // Vector concept<br />

int size() const ;<br />

value type operator() ( int i ) const ;<br />

} ;<br />

Make an expression <strong>for</strong> a scalar times a vector:<br />

template <br />

class scalar times vector expression{<br />

} ;


186 CHAPTER 5. META-PROGRAMMING<br />

template <br />

scalar times vector expressions operator∗( Scalar const& s, Vector const& v ) {<br />

return scalar times vector expressions( s, v ) ;<br />

}<br />

Put all classes and functions in the namespace athens. You can also make an expression template<br />

<strong>for</strong> the addition of two vectors.<br />

Write a small program, e.g.<br />

int main() {<br />

athens::my vector v( 5 ) ;<br />

... Fill in some values of v ...<br />

athens::my vector w( 5 ) ;<br />

w = 5.0 ∗ v ;<br />

w = 5.0 ∗ (7.0 ∗ v ) ;<br />

w = v + 7.0∗v ; // (If you have added the operator+)<br />

}<br />

Use the debugger to see what happens.


Inheritance<br />

Chapter 6<br />

C ++ is a multi-paradigm language and the paradigm that is most strongly associated with C ++<br />

is ‘Object-Oriented Programming’ (OOP). The authors feel nevertheless that it is not the most<br />

important paradigm <strong>for</strong> scientific programming because it is inferior to generic programming<br />

<strong>for</strong> two major reasons:<br />

• Flexibility and<br />

• Per<strong>for</strong>mance.<br />

However, the impact of these two disadvantages is negligible in some situations. The per<strong>for</strong>mance<br />

is only deteriorated when we use virtual functions (§ 6.1).<br />

OOP in combination with generic programming is a very powerful mechanism to provide a <strong>for</strong>m<br />

of reusability that neither of the paradigms can provide on it own (§ 6.3–§ 6.5).<br />

6.1 Basic Principles<br />

See section ?? from page ?? to page ??.<br />

6.2 Dynamic Selection by Sub-typing<br />

solver base class<br />

The way solvers are selected in AMDiS. The MTL4 solvers generic functions. AMDiS is only<br />

slightly generic but many decisions are made at run-time (by means of pointers and virtual<br />

functions). So, we needed a way to call the generic functions but decide at run time which one.<br />

The dynamic solver selection can be done with classical C features like:<br />

#include <br />

#include <br />

class matrix {};<br />

class vector {};<br />

void cg(const matrix& A, const vector& b, vector& x)<br />

187


188 CHAPTER 6. INHERITANCE<br />

{<br />

}<br />

std::cout ≪ ”CG\n”;<br />

void bicg(const matrix& A, const vector& b, vector& x)<br />

{<br />

std::cout ≪ ”BiCG\n”;<br />

}<br />

int main (int argc, char∗ argv[])<br />

{<br />

matrix A;<br />

vector b, x;<br />

}<br />

switch (std::atoi(argv[1])) {<br />

case 0: cg(A, b, x); break;<br />

case 1: bicg(A, b, x); break;<br />

}<br />

return 0 ;<br />

This works but it is not scalable with respect to source code complexity. If we call the solver<br />

with other vectors and matrices somewhere else we must copy the whole switch-case-block<br />

<strong>for</strong> each argument combination. This can avoided by encapsulating the block into a function<br />

and call this one with different arguments. More complicated is to different preconditioners<br />

(diagonal, ILU, IC, . . . ) that are also dynamically selected. Shall we copy a switch block <strong>for</strong><br />

the preconditioners into each case block of the solvers?<br />

An elegant solution is an abstract solver class and derived classes <strong>for</strong> the solvers:<br />

struct solver<br />

{<br />

virtual void operator()(const matrix& A, const vector& b, vector& x)= 0;<br />

virtual ∼solver() {}<br />

};<br />

// potentially templatize<br />

struct cg solver : solver<br />

{<br />

void operator()(const matrix& A, const vector& b, vector& x) { cg(A, b, x); }<br />

};<br />

struct bicg solver : solver<br />

{<br />

void operator()(const matrix& A, const vector& b, vector& x) { bicg(A, b, x); }<br />

};<br />

In the application we can define one or multiple pointers of type solver and assign them the<br />

desired solver:<br />

// Factory<br />

solver∗ my solver= 0;<br />

switch (std::atoi(argv[1])) {<br />

case 0: my solver= new cg solver; break;<br />

case 1: my solver= new bicg solver; break;


6.3. REMOVE REDUNDANCY WITH BASE CLASSES 189<br />

}<br />

This idea is discussed thouroughly in the design patterns book [?] as factory pattern. Once we<br />

have defined a pointer of such a abstract class (also called interface), we can call it directly:<br />

(∗my solver)(A, b, x);<br />

Without going into detail, we can have multiple factories and use the pointers together without<br />

the combinatorial explosion in the program sources:<br />

// Preconditioner factory<br />

precon∗ my precon= 0;<br />

switch (std::atoi(argv[2])) { ... }<br />

(∗my solver)(∗my precon, A, b, x);<br />

C ++ does not allow virtual template functions because this would make the compiler implementation<br />

very complicated to avoid infinite function pointer tables. However, template classes can<br />

have virtual functions. This enables generic programming with virtual functions by templatizing<br />

the entire class not single member functions.<br />

6.3 Remove Redundancy With Base Classes<br />

especially when no type infos involved<br />

6.4 Casting Up and Down and Elsewhere<br />

In C ++, there are four different cast operators:<br />

• static cast;<br />

• dynamic cast;<br />

• const cast; and<br />

• reinterpret cast.<br />

Its linguistic root C knew only one casting operator: ‘( type ) expr’. The trouble with this<br />

single operator is that it is not standardized or clearly defined which casting is per<strong>for</strong>med under<br />

which conditions. As a consequence, the behavior of the casting can change from compiler to<br />

compiler. C ++ still allows the this old-style casting but all C ++ experts agree on discouraging<br />

its use. Another quite important issue is that this notation is not easy to find in large code<br />

bases (there is no regular expression to filter out all C casts) what increases significantly the<br />

maintenance costs, see also discussion in [SA05, chapter 95]. In this section, we will show you<br />

the different cast operators and discuss the pros and cons of different casts in different contexts.


190 CHAPTER 6. INHERITANCE<br />

6.4.1 Casting Between Base and Derived Classes<br />

Casting Up<br />

⇒ up down cast example.cpp<br />

Casting up, i.e. from a derived to a base class, is always possible if there are no ambiguities and<br />

can be even per<strong>for</strong>med implicitly. Assume we have the following class structure: 1<br />

struct A<br />

{<br />

virtual void f(){}<br />

virtual ∼A(){}<br />

int ma;<br />

};<br />

struct B : A { float mb; };<br />

struct C : A {};<br />

struct D : B, C {};<br />

and the following unary functions:<br />

void f(A a) { /∗ ... ∗/ }<br />

void g(A& a) { /∗ ... ∗/ }<br />

void h(A∗ a) { /∗ ... ∗/ }<br />

An object of type B can be passed to all three funtions:<br />

int main (int argc, char∗ argv[])<br />

{<br />

B b;<br />

f(b);<br />

g(b);<br />

h(&b);<br />

}<br />

return 0 ;<br />

In all three cases the object b is implicitly converted to an object of type A. The call of function<br />

‘f’ is however a bit different: only b’s members within class A are copied into the function<br />

argument and the remainder — in our example the member ‘mb’ is not accessible in f by any<br />

means. The functions ‘g’ and ‘h’ refer to an object of type A by reference or pointer. If an<br />

object of a derived class is passed to one of those functions the other members are in principle<br />

still there but just hidden. One could still access them by down-casting the argument in the<br />

function. Be<strong>for</strong>e we down-cast we should ask ourselves the following questions:<br />

• How do we assure that the argument passed to function is really an object of the derived<br />

class? For instance with extra arguments or with run-time tests.<br />

• What can we do if the object cannot be down-casted?<br />

• Can we write directly a function <strong>for</strong> the derived class?<br />

• Why we do not overload the function <strong>for</strong> the base and the derived type? This is definitively<br />

a much cleaner design and always feasible.<br />

1 TODO: picture


6.4. CASTING UP AND DOWN AND ELSEWHERE 191<br />

Up-casting only fails if the base class is ambiguous. In the current example we cannot up-cast<br />

from D to A:<br />

D d;<br />

A ad(d); // error: ambiguous<br />

because the compiler does not know if we mean the base class A from B or from C. We can<br />

clarify this with an explicit intermediate up-cast:<br />

A ad(B(d));<br />

Or we can share A between B and C: 2<br />

struct B : virtual A { float mb; };<br />

struct C : virtual A {};<br />

Now the members of A exist only once in D which is probably the best solution <strong>for</strong> multiple<br />

inheritance in most cases because we safe memory and do not need to pay attention which<br />

replica of A is accessed.<br />

Casting Down<br />

There are situations where references or pointers are casted down, e.g. in the next section § 6.5.<br />

This can be per<strong>for</strong>med with static cast or dynamic cast. As the names suggest, static cast is<br />

statically type-checked during compile time whereas dynamic cast per<strong>for</strong>ms run-time tests (with<br />

only minimal compile-time tests). We still use our diamond-shaped class hierarchy A–D as case<br />

study. Now we introduce to pointers of type B∗ holding objects of types B and D:<br />

B ∗bbp= new B, ∗bdp= new D;<br />

When we cast these pointers down to D∗, dynamic cast verifies whether the referred object<br />

actually allows this cast. Since this in<strong>for</strong>mation is in general only known at run time, e.g.:<br />

B ∗bxp= argc > 1 ? new B : new D;<br />

dynamic cast must verify the referred object’s type with run-time in<strong>for</strong>mation (RTTI). Per<strong>for</strong>ming<br />

an incorrect cast leads to a null pointer:<br />

D∗ dbp= dynamic cast(bbp); // error: cannot downcast from B to D<br />

D∗ ddp= dynamic cast(bdp); // ok: bdp points to bn object of type D<br />

std::cout ≪ ”Dynamic downcast of bbp should fail and pointer should be 0, it is: ” ≪ dbp ≪ ’\n’;<br />

std::cout ≪ ”Dynamic downcast of bdp should succeed and pointer should not be 0, it is: ” ≪ ddp ≪<br />

’\n’;<br />

The programmer can check the zeroness of the pointer and eventually react to the failed downcast.<br />

Likewise, incorrect down-casts of references throw an exception of type std::bad cast can<br />

be handled in a try-catch block.<br />

In contrast to it, static cast only verifies that the target type is a derived class of the source<br />

type — respectively references or pointers thereof — or vice versa:<br />

2 TODO: picture


192 CHAPTER 6. INHERITANCE<br />

dbp= static cast(bbp); // erroneous downcast per<strong>for</strong>med<br />

ddp= static cast(bdp); // correct downcast but not checked by the system<br />

std::cout ≪ ”Erroneous downcast of bbp will not return 0, it is: ” ≪ dbp ≪ ’\n’;<br />

std::cout ≪ ”Correct downcast of bdp but not checked at run−time, it is: ” ≪ ddp ≪ ’\n’;<br />

Whether the referred object really allows <strong>for</strong> the downcast cannot be decided at compile time<br />

and lies in the responsibility of the programmer.<br />

Cross-casting<br />

An interesting feature of dynamic cast is casting across from B to C when the referred object’s<br />

type is a derived class of both types:<br />

C∗ cdp= dynamic cast(bdp); // cross−cast from B to C ok: bdp points to an object of type D<br />

std::cout ≪ ”Dynamic cross−cast of bdp should succeed and pointer should not be 0, it is: ” ≪ cdp ≪<br />

’\n’;<br />

Static cross-casting from B to C:<br />

cdp= static cast(bdp); // error: cross−cast from B to C does not compile<br />

is not possible because C is neither a base or derived class of B. It can be casted indirectly over<br />

D:<br />

cdp= static cast(static cast(bdp)); // error: cross−cast from B to C via D<br />

This again is in the responsibility of the programmer whether the addressed object can be really<br />

casted this way.<br />

Comparing Static and Dynamic Cast<br />

Dynamic casting is safer but slower then static casting due the run-time check of the referred<br />

object’s type. Static casting allows <strong>for</strong> casting up and down with the programmer’s responsibility<br />

that the referred objects are handled correctly. Dynamic casting is in some sense always<br />

up, namely from the referred object’s type to a super-type (including itself).<br />

Furthermore, dynamic casting can only be applied on ‘Polymorphic Types’ that are class that<br />

define or inherit a virtual function. The following summarizes the differences between the to<br />

<strong>for</strong>ms of casting:<br />

static cast dynamic cast<br />

Applicability all only polymorphic classes<br />

Cross-casting no yes<br />

Run-time check no yes<br />

Speed no run-time overhead overhead <strong>for</strong> checking<br />

Table 6.1: Static vs. dynamic cast


6.5. BARTON-NACKMAN TRICK 193<br />

6.4.2 Const Cast<br />

const cast adds or removes the attributes const and/or volatile. The key word volatile in<strong>for</strong>ms the<br />

compiler that a variable can be modified by other programs. It is there<strong>for</strong>e not hold or cached<br />

in registers and accessed each time memory. This feature is not used in this script. Adding<br />

an attribute is an implicit conversion in C ++. That is one can always assign an expression<br />

to an variable of the same type with extra attributes without the need <strong>for</strong> a cast. Removing<br />

an attribute requires a const cast and should be only done when unavoidable, e.g. to interface<br />

old-style software that is lacking appropriate const attributes.<br />

6.4.3 Reinterpretation Cast<br />

This is the most aggressive <strong>for</strong>m of casting and not used in this script. It takes an address or an<br />

object’s memory location and interprets the bits there as it was of the target type. One can <strong>for</strong><br />

instance change a single bit in a floating point number by casting it to a bit chain. It is more<br />

important <strong>for</strong> programming hardware drivers than complex flux solvers. Needless to say that<br />

reinterpret cast is one of the most efficient ways to undermine the portability of an application.<br />

6.5 Barton-Nackman Trick<br />

This section describes the ‘Curiously Recurring Template Pattern’ (CRTP). It was introduced<br />

by John Barton and Lee Nackman [?] and is there<strong>for</strong>e also referred to as the ‘Barton-<br />

Nackman Trick’.<br />

6.5.1 A Simple Example<br />

⇒ crtp simple example.cpp<br />

We will explain this with a simple example. Assume we have a class point with an equality<br />

operator:<br />

class point<br />

{<br />

public:<br />

point(int x, int y) : x(x), y(y) {}<br />

bool operator==(const point& that) const { return x == that.x && y == that.y; }<br />

private:<br />

int x, y;<br />

};<br />

We can program the unequality by using common sense or by applying de Morgan’s law:<br />

bool operator!=(const point& that) const { return x != that.x || y != that.y; }<br />

Or we can simplify our live and just negate the result of the equality:<br />

bool operator!=(const point& that) const { return !(∗this == that); }


194 CHAPTER 6. INHERITANCE<br />

Our compilers are so sophisticated, they certainly handle de Morgan’s law perfectly. Negating<br />

the equality operator is something we can do on every type that has an equality operator. We<br />

could copy-and-past this code snippet and just replace the type of the argument.<br />

Alternatively, we can write a class like this:<br />

template <br />

struct unequality<br />

{<br />

bool operator!=(const T& that) const { return !(static cast(∗this) == that); }<br />

};<br />

and derive from it:<br />

class point : public unequality { ... };<br />

This mutual dependency:<br />

• One class is derived from the other and<br />

• The latter takes the derived class’ type as template argument<br />

is somewhat confusing at the first view.<br />

Essential <strong>for</strong> this to work is that the code of a template class member is only generated when<br />

the class is instantiated and the function is actually called. At the time the template class<br />

‘unequality is parsed, the compiler checks only the correctness of the syntax.<br />

When we write<br />

int main (int argc, char∗ argv[])<br />

{<br />

point p1(3, 4), p2(3, 5);<br />

std::cout ≪ ”p1 != p2 is ” ≪ (p1 != p2 ? ”true” : ”false”) ≪ ’\n’;<br />

}<br />

return 0 ;<br />

fter the definition of unequality and point both types are completely known to the compiler.<br />

What happens when we call p1 != p2?<br />

1. The compiler searches <strong>for</strong> operator!= in class point → without success.<br />

2. The compiler looks <strong>for</strong> operator!= in the base class unequality → with success.<br />

3. The this pointer of unequality refers a component of point’s this pointer.<br />

4. Both types are completely known and we can statically down-cast the this pointer to point.<br />

5. Since we know that the this pointer of unequality is an up-casted this pointer to<br />

point 3 we are save to down-cast it to its original type.<br />

6. The equality operator <strong>for</strong> point is called. Its implementation is already known at this point<br />

because the code of unequality’s operator!= is not generated be<strong>for</strong>e the instantiation<br />

of point.<br />

3 Unless the first argument is really of type unequality. There are also ways to impede this, e.g. http:<br />

//en.wikipedia.org/wiki/Barton-Nackman_trick but we used this unary operator notation <strong>for</strong> the sake of<br />

simplicity.


6.5. BARTON-NACKMAN TRICK 195<br />

Likewise every class U with an equality operator can be derived from unequality. A collection<br />

of such CRTP templates <strong>for</strong> operator defaults is provided by Boost.Operators from Jeremy<br />

Siek and David Abrahams.<br />

Alternatively to the above implementation where the this pointer is dereferred and casted as<br />

reference, one can cast the pointer first and derefer it afterwards:<br />

template <br />

struct unequality<br />

{<br />

bool operator!=(const T& that) const { return !(∗static cast(this) == that); }<br />

};<br />

There is no difference, this is just a question of taste.<br />

6.5.2 A Reusable Access Operator<br />

⇒ matrix crtp example.cpp<br />

We still owe the reader the reusable implementation of the matrix bracket operator promised<br />

in Section 3.7.4. Back then we did not know enough language features.<br />

First of all we had no templates which are indispensable <strong>for</strong> a proxy. We will show you why.<br />

Say we have a matrix class as in § 3.7.4 and we just want to call the binary operator() from the<br />

unary operator[] via a proxy:<br />

class matrix; // Forward declaration<br />

class simple bracket proxy<br />

{<br />

public:<br />

simple bracket proxy(matrix& A, int r) : A(A), r(r) {}<br />

double& operator[](int c){ return A(r, c); }<br />

private:<br />

matrix& A;<br />

int r;<br />

};<br />

class matrix<br />

{<br />

// ...<br />

double& operator()(int r, int c) { ... }<br />

};<br />

simple bracket proxy operator[](int r)<br />

{<br />

return simple bracket proxy(∗this, r);<br />

}<br />

This does not compile because operator[] from simple bracket proxy calls operator() from matrix but<br />

this is not defined yet. The <strong>for</strong>ward declaration of matrix is not sufficient because we need the<br />

complete definition of matrix not only the assertion that the type exist. Vice versa if we define<br />

matrix first, we would miss the constructor of simple bracket proxy in the operator[] implementation.


196 CHAPTER 6. INHERITANCE<br />

Another disadvantage of the implementation above is that we would need another proxy <strong>for</strong> the<br />

constant access.<br />

This is an interesting aspect of templates. It does not only enable writing type-parametric software<br />

but can also help to break mutual dependencies thanks to its post-poned code generation.<br />

By templetizing the proxy the dependency is gone:<br />

template <br />

class bracket proxy<br />

{<br />

public:<br />

bracket proxy(Matrix& A, int r) : A(A), r(r) {}<br />

Result& operator[](int c){ return A(r, c); }<br />

private:<br />

Matrix& A;<br />

int r;<br />

};<br />

class matrix<br />

{<br />

// ...<br />

bracket proxy operator[](int r)<br />

{<br />

return bracket proxy(∗this, r);<br />

}<br />

};<br />

With this implementation we can now write A[i][j] and it is realized by the binary operator()<br />

however this is implemented. Such a bracket operator is useful in every matrix class and the<br />

implementation will be always the same.<br />

For this reason we like to have this implementation only once in our code base and reuse<br />

whereever appropriate. The only way to achieve this is with the CRTP paradigm:<br />

template <br />

class bracket proxy<br />

{<br />

public:<br />

bracket proxy(Matrix& A, int r) : A(A), r(r) {}<br />

Result& operator[](int c){ return A(r, c); }<br />

private:<br />

Matrix& A;<br />

int r;<br />

};<br />

template <br />

class crtp matrix<br />

{<br />

public:<br />

bracket proxy operator[](int r)<br />

{<br />

return bracket proxy(static cast(∗this), r);<br />

}


6.5. BARTON-NACKMAN TRICK 197<br />

};<br />

bracket proxy operator[](int r) const<br />

{<br />

return bracket proxy(static cast(∗this), r);<br />

}<br />

class matrix : public crtp matrix<br />

{<br />

// ...<br />

};<br />

Once we have such a CRTP class we can provide a bracket operator <strong>for</strong> every matrix class with a<br />

binary application operator. In a full-fledged linear algebra package one needs to pay attention<br />

which matrices return references and which are mutable but the approach is as described above.<br />

Several timings have shown that the indirection with the proxy did not create run-time overhead<br />

compared to the direct usage of the binary access operator. Apparently, the compilers optimized<br />

the creation of proxies away in the executables.


198 CHAPTER 6. INHERITANCE


Effective Programming: The<br />

Polymorphic Way<br />

Chapter 7<br />

Fools ignore complexity. Pragmatists suffer it. Some can avoid it. Geniuses remove<br />

it.<br />

—Alan Perlis<br />

To remove complexity in scientific application development (but not only there), several programming<br />

techniques, methods, and application of paradigms have to be used accordingly. This<br />

not only depends on the ability to combine application-specific functionality with other librarycode<br />

from a variety of sources but also to restrict the amount of application-specific glue code.<br />

So libraries must remain open <strong>for</strong> extension but closed <strong>for</strong> modification, which can be attributed<br />

to a technique called polymorphic programming.<br />

The presented sections of this book introduced important mechanisms to successfully develop<br />

scientific applications such as <strong>C++</strong> basics, encapsulation, generic and meta programming as<br />

well as inheritance. An important part of scientific computing, matrix containers and matrix<br />

algorithms, has been presented to aid the topics so far. Effective programming is then possible if<br />

these mechanisms are not viewed as separate entities, but as different characteristics to achieve<br />

important goals, such as<br />

• uncompromising efficiency of simple basic operations (e.g., array subscripting should not<br />

incur the cost of a function call),<br />

• type-safety (e.g., an object from a container should be usable without explicit or implicit<br />

type conversion),<br />

• code reuse and extensibility,<br />

all with their respective advantages and disadvantages. This section reviews important techniques<br />

to achieve polymorphism from a more general point of view and highlights a basic but<br />

very important recurring principle <strong>for</strong> scientific computing: code reusability. This is not mainly<br />

because programmers are lazy people, but also because applications have to be tested. For<br />

the field of scientific applications this is particularlly important due to large parameter sets,<br />

changing boundary and initial conditions, as well as long run-times of simulation codes. Hence<br />

it should not be underestimated how much time and ef<strong>for</strong>t can be saved, if already tested code<br />

can be used as starting point or reference. So code reusability is not only about programming<br />

less, but also because of extend code quality. Most of the presented and discussed technique so<br />

199


200 CHAPTER 7. EFFECTIVE PROGRAMMING: THE POLYMORPHIC WAY<br />

far already deal with some kind of code reusability, but mostly in an implicit way. The following<br />

section overview polymorphic mechanisms in a more explicit way.<br />

As soon as code reusability is covered, an almost equal importance is placed on code extensibility,<br />

which should not be constrained by reused code. Scientific code development is always<br />

driven by trans<strong>for</strong>ming newly developed scientific methods into executable code. Various programming<br />

techniques with different scopes are there<strong>for</strong>e mandatory. If programming techniques<br />

are analyzed this way, it becomes understandable why some of the presented programming<br />

paradigms are not ideally suited to accomplish code reusability and extensibility together (e.g.,<br />

the object-oriented inheritance model).<br />

No technique, or more generally paradigm, will result in the ultimate and final solution, but<br />

each of the techniques results in tools to manage the complexity of a problem. It does not give<br />

you the ability to do so. A bad problem specification will lead to a bad solution independently<br />

from the technique or paradigm used <strong>for</strong> implementation.<br />

The usage of the Boost graph library (BGL) is an excellent example. There is great diversity<br />

of requirements in the field of graph algorithms and data structures. Even so, the per<strong>for</strong>mance<br />

claim <strong>for</strong> a library like this, is very high. Nevertheless, it was possible to implement all necessary<br />

functionality at a high per<strong>for</strong>mance level. More than this, the library can be extended greatly<br />

in many different ways. But on the other side, this library is not easy to use or extend without<br />

an understanding of the underlying techniques.<br />

And as a reminder of the main goal of this book is how to write good scientific software.


7.1. IMPERATIVE PROGRAMMING 201<br />

7.1 Imperative Programming<br />

Imperative programming may be viewed as the very bones on which all other abstractions<br />

depend. This programming paradigm uses a sequence of instructions which act on a state to<br />

realize algorithms. Thus it is always specified in detail what and how to execute next. The<br />

modification of the program state while convenient is also an issue, as with increasing size of<br />

the program, unintended modifications of the state becomes an increasing problem. In order<br />

to address this issue the imperative programming method has been refined to procedural and<br />

structured programming paradigms, which attempt to provide more control of the modifications<br />

of the program state. Hence it is based upon organized procedure calls. Procedure calls, also<br />

known as routines, subroutines, methods, or functions simply contain a series of computational<br />

steps to be carried out. Any given procedure might be called at any point during a program’s<br />

execution, including other procedures or itself. A function consists of:<br />

• The return type of the function: A function returns the value at a user specified position.<br />

C or C ++ which does not provide procedures explicitely has to use the keyword void <strong>for</strong><br />

indicating that a function does not return a value.<br />

• The name of the function: Therewith the function can be called. The name should be as<br />

expressive possible. Never underestimate names with good significance.<br />

• The parameter list of the function: The parameters of a function serve as placeholders<br />

<strong>for</strong> values that are later supplied by the user during each invocation of the function. A<br />

function can have an empty parameter list. The values of the parameter list can be given<br />

by value or by reference.<br />

• The body of the function: The body of a function implements the logic of the operation.<br />

Typically, it manipulates the named parameters of the function.<br />

The advantages of this paradigm are:<br />

• Few techniques<br />

• Rapid prototyping <strong>for</strong> easy problems<br />

• Functions can be put into a library<br />

• Fast compilation<br />

The disadvantages of this paradigm are:<br />

• Test ef<strong>for</strong>t is high<br />

• Source of error is manifold<br />

• Non-trivial problems cause high programming line ef<strong>for</strong>t<br />

• No user defined data types<br />

• No locality of data<br />

• Only very few and simple functions can be put into a library<br />

Even in the refined <strong>for</strong>m as procedural programming the incurred overhead can be limited to a<br />

bare minimum as the level of abstraction is relatively low. This was well suited <strong>for</strong> the situation


202 CHAPTER 7. EFFECTIVE PROGRAMMING: THE POLYMORPHIC WAY<br />

of scarce computing resources and a lack of mature and powerful tools. Under these circumstances<br />

the overall per<strong>for</strong>mance, in terms of execution speed or memory consumption is solely<br />

dependent on the skill and ingenuity of the programmer and has resulted in the almost mythical<br />

”hand optimized” code. However, to achive the desired specifications in such a fashion the<br />

clarity and readability, and therby the maintainability of the code were sacrificed. Furthermore,<br />

the low level of abstraction also hinders portability, as different architectures favour different<br />

assumptions to produce efficient execution. To address this effect, implementations were duplicated<br />

in order to optimize <strong>for</strong> different architectures and plat<strong>for</strong>ms, which of course makes a<br />

mockery of goals such as code reusability or even extensiblity.<br />

This paradigm and the derived techniques are then used differently in Section 2.11, where<br />

generic programming is used to offer an efficient approach <strong>for</strong> matrix operations.


7.2. GENERIC PROGRAMMING 203<br />

7.2 Generic Programming<br />

Generic programming may be viewed as having been developed in order to further facilitate<br />

the goals of code reusability and extensibility. From a general view the generic programming<br />

paradigm is about generalizing software components so that they can be directly reused easily<br />

in a wide variety of situations. While these are among the goals, which lead to the development<br />

of object oriented programming, it may vary quite profoundly in the realization. A major<br />

distinction from object oriented programming, which is focused on data structures and their<br />

states, is that it especially allows <strong>for</strong> a very abstract and orthogonal description of algorithms.<br />

To achieve this kind of generalization a separation of the basic tools of programming are important:<br />

algorithms, containers (data structures), and a glue between them (so called iterators<br />

or more generally traversors). As introduced as an important part <strong>for</strong> effective programming,<br />

the minimization of glue code, iterators and traversal objects operate as a minimal but fully<br />

abstract interface between data structures and algorithms.<br />

While the desired functionality is often implemented using static polymorphism mechanisms,<br />

such as templates in <strong>C++</strong>, generic programming should not be equated with simply programming<br />

with templates. However, when generic programming is realized using purely compile<br />

time facilities such as static polymorphism, not only is implementation ef<strong>for</strong>t reduced but the<br />

resulting run time per<strong>for</strong>mance optimized.<br />

In the following, the process of generic programming is given by elevating a procedural code to<br />

a generic one simultanioulsy fullfilling the important topics of effective programming (efficiency,<br />

type-safety, code reuse):<br />

• Algorithm: Generic algorithms are generic in two ways. First the data type which they<br />

are operating on is arbitrary and second, the type of container within which the elements<br />

are held is arbitrary.<br />

To get in touch with the generic approach, a generalization of the memcpy() function of the<br />

C standard library is discussed. An implementation of memcpy() might look somewhat<br />

like the following:<br />

void∗ memcpy(void∗ region1, const void∗ region2, size t n)<br />

{<br />

const char∗ first = (const char∗)region2;<br />

const char∗ last = ((const char∗)region2) + n;<br />

char∗ result = (char∗)region1;<br />

while (first != last)<br />

∗result++ = ∗first++;<br />

return result;<br />

}<br />

The memcpy() function is already generalized to some extent by the use of void∗ so that<br />

the function can be used to copy arrays of different kinds of data.<br />

Looking at the body of memcpy(), the function’s minimal requirements are that it needs to<br />

traverse the sequence using some sort of pointer, access the elements pointed to, copy the<br />

elements to the destination, and compare pointers to know when to stop. The memcpy()<br />

function can then be written in a generic manner:<br />

template <br />

OutputIterator


204 CHAPTER 7. EFFECTIVE PROGRAMMING: THE POLYMORPHIC WAY<br />

copy(InputIterator first, InputIterator last, OutputIterator result)<br />

{<br />

while (first != last)<br />

∗result++ = ∗first++;<br />

return result;<br />

}<br />

With this code the same functionality of the memcpy() from the C library is achieved.<br />

All kinds of data structure which offer an begin() and end() iterator can be used.<br />

• Container: An abstraction to all kinds of data structures which can store other data<br />

types.<br />

• Iterator: This is the glue between the containers and the algorithms. First, it separates<br />

the usage of data structures and algorithms. Second, it provides a concept hierarchy <strong>for</strong><br />

all kinds of traversal within data structures.<br />

This type of genericity is called parametric polymorphism (see Section 7.5.2). Section 4.9<br />

introduced the Standard Template Library (STL). The STL solves many standard data structure<br />

and algorithmic problems. The STL is (or should be) the first choice in all code development<br />

steps.<br />

• Algorithm/Data-Structure Interoperability: First, each algorithm is written in a datastructure<br />

neutral way, allowing a single template function to operate on many different<br />

classes of containers. The concept of an iterator is the key ingredient in this decoupling of<br />

algorithms and data-structures. The impact of this technique is a reduction of the STL’s<br />

code size from O(M*N) to O(M+N), where M is the number of algorithms and N is the<br />

number of containers. Considering a situation of 20 algorithms and 5 data-structures,<br />

this makes the difference between writing 100 functions versus only 25 functions! And the<br />

differences grows faster as the number of algorithms and data-structures increase.<br />

• Extension through Function Objects: The second way that the STL is generic is that its<br />

algorithms and containers are extensible. The user can adapt and customize the STL<br />

through the use of function objects. This flexibility is what makes STL such a great<br />

tool <strong>for</strong> solving real-world problems. Each programming problem brings its own set of<br />

entities and interactions that must be modeled. Function objects provide a mechanism<br />

<strong>for</strong> extending the STL to handle the specifics of each problem domain.<br />

• Element Type Parametrization: The third way that STL is generic is that its containers<br />

are parametrized on the element type.<br />

Most people think, that element type parametrization is the feature that makes the successful.<br />

This is perhaps the least interesting way in which STL is generic. The interoperability with<br />

iterators and the extensibility by function objects are more important parts of the STL. But<br />

the essence is the programming with concepts. The programmer can write the data structures<br />

and algorithms, or in other words the concept of these, as it should be. Next to these facts, the<br />

STL has proven that with the generic programming paradigm, high per<strong>for</strong>mance computing<br />

can be accomplished as well on several different computer architectures.<br />

The advantages of this paradigm are:<br />

• Programming with concepts<br />

• Great number of available libraries


7.2. GENERIC PROGRAMMING 205<br />

• Great expandibility<br />

• Great code reusability<br />

• Development of high per<strong>for</strong>mance code<br />

• All other paradigms can be used<br />

• Concepts can be proven by the compiler<br />

The disadvantages of this paradigm are:<br />

• Long compilation times: <strong>C++</strong> and the statical type checking requires a complete template<br />

instantiation and type checking.<br />

• Steep learning curve due to many complex techniques<br />

• Code bloat: Due to an incorrect usage of templates, the compiler can produce an excessive<br />

amount of code.


206 CHAPTER 7. EFFECTIVE PROGRAMMING: THE POLYMORPHIC WAY<br />

7.3 Programming with Objects<br />

Programming with objects may be viewed as an evolution from the structured imperative<br />

paradigm. It, on the one hand, tries to address the issue of code reuseability by providing a<br />

specific type of polymorphism, sub-typing. On the other hand it addresses the issue of unchecked<br />

modification of state by en<strong>for</strong>cing data encapsulation, thus en<strong>for</strong>cing changes through defined<br />

interfaces. Both of these notions are attached to an entity called an object. There<strong>for</strong>e an object<br />

serves as a self contained unit which interacts with the environment via messages. It thus<br />

accomplishes a decoupling of the internal implementation within the object and the interaction<br />

with the sourrounding environment. Thus en<strong>for</strong>cing (clean) interfaces, which is essential <strong>for</strong><br />

effective programming. The algorithms are expressed much more by the notion of what is to be<br />

done as an interaction and modification of objects, where the details of how, are encapsulated<br />

to a great extent within the objects themselves.<br />

Another benefit <strong>for</strong> programming with objects is, that these entities can be placed in libraries.<br />

This saves the ef<strong>for</strong>t of continually rewriting the same code <strong>for</strong> every new program. Furthermore,<br />

because objects can be made polymorphic, object libraries offer the programmer more flexibility<br />

and functionality than subroutine libraries (their counterparts in the procedural paradigm).<br />

Technically, object libraries are quite feasible, and the advantages of extensibility can be significant.<br />

However, the real challenge to making code reusable is not technical. Rather, it is<br />

identifying functionality that other people both understand and want. People who use procedural<br />

languages have been writing and using subroutine libraries <strong>for</strong> decades. These libraries are<br />

most successful when they per<strong>for</strong>m simple, clearly defined functions, such as calculating square<br />

roots or computing trigonometric functions. An object library can provide complex functions<br />

more easily than a subroutine library. However, unless those functions are clearly defined, well<br />

understood and generally useful, the library is unlikely to be used widely.<br />

To give an intuitive specification of the programming approach with objects, the following list<br />

specifies different points in the object world:<br />

• Identity is the quantization of data in discrete, distinguishable entities called objects<br />

• Classification is the grouping of objects with the same structure and behavior into classes<br />

• Polymorphism is the differentiation of behavior of the same operation on different classes<br />

• Inheritance is the sharing of structure and behavior among classes in a hierarchical relationship<br />

But one of the biggest problems of this programming approach is the interaction of objects with<br />

algorithms. The problem can easily be seen using the example of a simple sorting algorithm.<br />

Should the algorithm be placed into the object? Should an algorithm work on a class hierarchy<br />

with a common interface?<br />

The problem cannot be solved easily within this paradigm. A possible solution is some kind of<br />

polymorphism, which is explained in Section 7.5.2.<br />

7.3.1 Object-Based Programming<br />

In languages which support identity and classification the object-based paradigm can be used<br />

efficiently.


7.3. PROGRAMMING WITH OBJECTS 207<br />

The advantages of this paradigm are:<br />

• User defined data structures with data locality: programming can be more intuitive than<br />

compared to the procedural paradigm. and algorithms can be put into a library<br />

• Library code can be tested independently<br />

• Fast compilation, may be slower than procedural<br />

The disadvantages of this paradigm are:<br />

• Runtime per<strong>for</strong>mance<br />

• Library/code reusability<br />

7.3.2 Object-Oriented Programming<br />

To overcome the mentioned problem of code reusability, inheritance and polymorphism were<br />

introduced 1 . Inheritance is deployed with the aim of reducing implementation ef<strong>for</strong>ts by allowing<br />

refinement of already existing objects. By using inheritance and the connected sub typing also<br />

makes polymorphic programming available at run time:<br />

• Inheritance allows us to group classes into families of related types, allowing to share<br />

common operations and data. The reuse of already existing code can be accomplished.<br />

• Polymorphism allows us to implement these families as a unit rather than as individual<br />

classes, giving us greater flexibility in adding or removing any particular class. This point<br />

is explained in more detail in Section7.5.2, where this type of polymorphism is called<br />

subtyping polymorphism.<br />

• Dynamic binding is a third aspect of object-oriented programming.<br />

The actual member function resolution is delayed until run time. With the combination<br />

of inheritance and (subtyping) polymorphism a generic way of dealing with geometrical<br />

objects can be achieved.<br />

While the concepts of object orientation have proved to be invaluable to the development of<br />

modular software, its limits also became apparent as the goal of general reusability suffers from<br />

the stringent limitations of the required sub typing. Which may be viewed as a consequence<br />

that objects are not necessarily fit to accomodate the required abstractions such as in the case<br />

the algorithms themselves. Furthermore, the extension of existing codes is often only possible<br />

by intrusive means, such as changing the already existing implementations thus not leading to<br />

the high degree of reduction of ef<strong>for</strong>t as was hoped <strong>for</strong>.<br />

Compared to the run time environment or compiler required to realize the simple imperative<br />

programming paradigm, the object oriented paradigm requires more sophistication as it needs<br />

to be able to handle run time dispatches using virtual functions <strong>for</strong> instance. Additionally,<br />

seemingly simple statements may hide the true complexity encapsulated within the objects.<br />

Thus not only is the demand on the tools higher but the programmer also needs to be aware<br />

of the implications of the seemingly simple statements in order to achive desirable levels of<br />

per<strong>for</strong>mance.<br />

1 If a language supports all these features (identity, classification, polymorphism, and inheritance), then the<br />

object-oriented paradigm is supported in this language.


208 CHAPTER 7. EFFECTIVE PROGRAMMING: THE POLYMORPHIC WAY<br />

Behind the Dynamic Polymorphism in <strong>C++</strong><br />

A programmer must be aware of the fact, that inheritance is one of the<br />

strongest bonds between objects. In real-world examples, few problems can<br />

be modeled successfully by class-inheritance only. The coupling by inheritance<br />

should be used very carefully.<br />

The advantages of this paradigm are:<br />

• Library: Data-types can be enhanced greatly.<br />

• Abstract algorithms with polymorphism enable greater code reusability compared to the<br />

procedural paradigm.<br />

• Strong binding of data structures and methods: Logical connections can be modeled easily.<br />

Logical errors can be detected easily.<br />

The disadvantages of this paradigm are:<br />

• The binary-method problem (see Section 7.5.3)<br />

• Bad optimization capability of a compiler due to the subtyping polymorphism (see Section<br />

??)<br />

• Strong binding of data structures and methods: Only usable on object-oriented problems.


7.4. FUNCTIONAL PROGRAMMING 209<br />

7.4 Functional Programming<br />

In contrast to the procedural and object oriented paradigm, which explicitly <strong>for</strong>mulate algorithms<br />

and programs as a sequence of instructions which act on a program state, the functional<br />

paradigm uses mathematical functions <strong>for</strong> this task and <strong>for</strong>goes the use of a state altogether.<br />

There<strong>for</strong>e, there are no mutable variable and no side effects in purely functional programming.<br />

As such it is declarative in nature and relies on the language’s environment to produce an imperative<br />

representation which can be run on a physical machine. Among the greatest strengths of<br />

the functional paradigm is the availability of a strong theoretical framework of lambda calculus<br />

(cite()), which is explained in more detail in Section ref(), <strong>for</strong> the different implementations.<br />

Higher-order functions are an important concept of functional programming due to its usability<br />

in procedural languages. They were studied in lambda calculus theory well be<strong>for</strong>e the notion of<br />

functional programming existed and present the design of a number of functional programming<br />

languages, such as Scheme and Haskell.<br />

As modern procedural languages and their implementations have started to put greater emphasis<br />

on correctness, rather than raw speed, and the implementations of functional languages have<br />

begun to emphasize speed as well as correctness, the per<strong>for</strong>mance of functional languages and<br />

procedural languages has begun to converge. For programs which spend most of their time<br />

doing numerical computations, some functional languages (such as OCaml and Clean) can<br />

approach the per<strong>for</strong>mance of programs written in C speed, while <strong>for</strong> programs that handle<br />

large matrices and multidimensional databases, array functional languages (such as J and K)<br />

are usually faster than most non-optimized C programs. Functional languages have long been<br />

criticized as resource-hungry, both in terms of CPU resources and memory. This was mainly<br />

due to two things:<br />

• some early functional languages were implemented with no concern <strong>for</strong> efficiency<br />

• non-functional languages achieved speed at least in part by neglecting features such as<br />

checking of bounds or garbage collection which are viewed as essential parts of modern<br />

computing frameworks representing an overhead which was built-in to functional languages<br />

by default<br />

Since a purely functional description is free of side effects, it is a favourable choice <strong>for</strong> parallelization,<br />

as the description does not contain a state, which would require synchronization.<br />

Data related dependencies, however, must still be considered in order to ensure correct operation.<br />

Since the declarative style connected to the functional paradigm distances itself from<br />

the traditional imperative paradigm and its connection to states, input and output opearions<br />

pose a hurdle which is often addressed in a manner, which is not purely functional. As such<br />

functional interdependencies may be specified trivially, while the details how these are to be<br />

met remain opaque and as a choice to the specificc implementaiton.<br />

Last, we give an example of pure functional programming. We point out, that the next code<br />

snippet is presented in Haskel, not in C ++ syntax. The ”hello world” program in the functional<br />

programming paradigm: the factorial calculation<br />

fac :: Integer → Integer<br />

fac 0 = 1<br />

fac n | n>0 = n ∗ fac (n−1)


210 CHAPTER 7. EFFECTIVE PROGRAMMING: THE POLYMORPHIC WAY<br />

7.4.1 Lambda Calculus<br />

As was presented in Section ref(), is not very easy to reuse the STL standard function objects<br />

because the use is not very intuitive. Either a function object <strong>for</strong> each loop or binder has to be<br />

written. A binder (or binder object) is passed at construction time, to another function object<br />

which per<strong>for</strong>ms an action. The binder takes the function object as well as another binding value<br />

and makes a binary function unary by fixing the first parameter. However it is not obvious at<br />

first. An easy way to implemented such a functionality is to write it as it is:<br />

std::<strong>for</strong> each(vec.begin(), vec.end(), std::cout ≪ ∗vec iter);<br />

Of course this can not compile <strong>for</strong> several reasons. First the third argument is not a function<br />

object. Second the variable vec iter does not exist, nor does it know anything about the<br />

iterated container vec. Anyway, an expression like this is easy to write and less error prone<br />

compared to a binder object. To enable a program like this the following has to be accomplised:<br />

First the output-stream operator <br />

ArgumentT<br />

operator()(ArgumentT arg)<br />

{<br />

return arg1;<br />

}<br />

template< typename Argument1T, typename Argument2T><br />

ArgumentT<br />

operator()(Argument1T arg1, Argument2T arg2)<br />

{<br />

return arg1;<br />

}<br />

};<br />

So what does this object really do? It provides unary and binary bracket operators <strong>for</strong> one and<br />

two objects which return the argument passed. A function object is implemented next which<br />

stores an arbitrary stream type.<br />

template<br />

output function object<br />

{<br />

output function object (StreamType stream, FunctionObjectT func) : stream(stream), func(func) {}<br />

template < typename ArgumentT><br />

void operator()(ArgumentT arg)<br />

{


7.4. FUNCTIONAL PROGRAMMING 211<br />

}<br />

};<br />

stream ≪ func(arg);<br />

The only thing we have to do by now is to write an appropriate object generator around in<br />

order to persuade the C ++ syntax to accept something like the first line of code of this chapter.<br />

template<br />

output function object<br />

operator≪ (StreamType stream, FunctionObjectT func)<br />

{<br />

return output function object(stream, func);<br />

}<br />

By using these objects it is almost possible to offer a convenient way to write the already<br />

presented <strong>for</strong> each - code snippet. The remaining adaptation is to use a so-called unnamed<br />

object instead of the dereferenced iterator arg1.<br />

argument 1 function object arg1;<br />

std::<strong>for</strong> each(vec.begin(), vec.end(), std::cout ≪ arg1);<br />

By creating a collection of functor objects 2 a functional programming style can be mimiced.<br />

As can then be obvserved, polymorphism, which has to be especially provided in the imperative<br />

world, comes naturally to the functional paradigm as no specific assumptions about data types<br />

are required, only conceptual requirements need to be met.<br />

2 Instead of creating all of these functors again, the Boost Phoenix library or the <strong>C++</strong> TR1 lambda library<br />

can be used.


212 CHAPTER 7. EFFECTIVE PROGRAMMING: THE POLYMORPHIC WAY<br />

7.5 From Monomorphic to Polymorphic Behavior<br />

As presented in the last sections, each programming techique (or paradigm) offers different key<br />

benefits regarding effective programming. The imperative programming, the related procedural<br />

paradigm, and the object-related programming are simple and require that all calls to an object<br />

or function have exactly the same typing as the signature. So type checks and type constraints<br />

can be derived directly from the program text. But the effectivness (genericity and applicability)<br />

is greatly reduced <strong>for</strong> real world problems. This is in contrast to polymorphic code that freely<br />

operates only on abstract concept types. Polymorphic behavior enables the use of algorithms<br />

and data structures with several different types. The object-oriented, generic, and functional<br />

programming offer an additional mechanism which delays the actual type instantiation to a<br />

later evaluation point. Compared to the simple monomorphic way, the polymorphic mechanism<br />

is composed of a complex set of inference rules, because there is propagation of type in<strong>for</strong>mation<br />

between the object and function signature and the call signature in both directions.<br />

In object-oriented programming, libraries typically specify that the types supplied to the library<br />

must be derived from a common abstract base class, providing implementa- tions <strong>for</strong> a collection<br />

of pure virtual functions. The library knows only about the abstract base class interface,<br />

but can be extendedto work with new user types derived from the abstract interface. That is,<br />

variability is achieved through differing implementations of the virtual functions in the derived<br />

classes. This is how object-oriented programming supports modules that are closed <strong>for</strong> modification,<br />

yet remain open <strong>for</strong> extension. One strength of this paradigm is its support <strong>for</strong> varying<br />

the types supplied to a module at runtime. Composability of modules is limited, however,<br />

since independently produced modules generally do not agree on common abstract interfaces<br />

from which supplied types must inherit. The paradigm of generic programming, pioneered by<br />

Stepanov, Musser and their collaborators, is based on the principle of decomposing software into<br />

efficient components which make only minimal assumptions about other components, allowing<br />

maximum flexibility in composition. <strong>C++</strong> libraries developed following the generic programming<br />

paradigm typically rely on templates <strong>for</strong> the parametric and ad- hoc polymorphism they<br />

offer. Composability is enhanced as use of a library does not require inheriting from a particular<br />

abstract interface. Interfaces of library components are specified using concept collections of<br />

requirements analogous to, say, Haskell type classes. The key difference to abstract base classes<br />

and inheritance is that a type can be made to satisfy the constraints of a concept retroactively,<br />

independently of the definition of the type. Also, generic programming strives to make algorithms<br />

fully generic, while remaining as efficient as non-generic hand-written algorithms. Such<br />

an approach is not possible when the cost of any customization is a virtual function call.<br />

The strength of polymorphism is that the same piece of code can operate on different types,<br />

even types that are not known at the time the code was written. Such applicability is the<br />

cornerstone of polymorphism because it amplifies the usefulness and reusability of code. If<br />

the types of poymorphism are analysed in more detail, then two different main types can be<br />

observed:<br />

• Ad-hoc polymorphism<br />

• Universal polymorphism<br />

Only the second type, universal polymorphism, is actually important <strong>for</strong> effective programming,<br />

where the first type, ad-hoc polymorphism, is rather convenience.


7.5. FROM MONOMORPHIC TO POLYMORPHIC BEHAVIOR 213<br />

7.5.1 Ad-hoc Polymorphism<br />

This kind of polymorphic behavior is expressed with ad-hoc, which should point out, that this<br />

kind of behavior is locality. Common to these two types (overloading and coercion) is the fact<br />

that the programmer has to specify exactly what types are to be usable with the polymorphic<br />

function.<br />

Overloading<br />

Is a simple convenient way of programming, to ease the programmer’s life.<br />

class my stack<br />

{<br />

virtual bool push(int ..) {}<br />

virtual bool push(double ..) {}<br />

virtual bool push(complex ..) {}<br />

virtual int pop() {..}<br />

virtual double pop() {..}<br />

// ....<br />

};<br />

Coercion<br />

Coercion is automatic type conversion. The following stack example can be used with all<br />

numerical data types, which can be converted to double:<br />

class my stack<br />

{<br />

virtual bool push(double ..) {}<br />

virtual double pop() {..}<br />

// ....<br />

};


214 CHAPTER 7. EFFECTIVE PROGRAMMING: THE POLYMORPHIC WAY<br />

7.5.2 Universal Polymorphism<br />

The universal in the title means, that the different kinds of expression <strong>for</strong> polymorphic behavior<br />

in this section are the most useful techniques to accomplish the desired behavior and should be<br />

used preferably:<br />

• Dynamic polymorphism (subtyping)<br />

• Static polymorphism (parametric)<br />

Subtyping Polymorphism<br />

In <strong>C++</strong> the object-oriented paradigm implements subtyping polymorphism 3 using sub-classing.<br />

The term dynamic polymorphism is often found <strong>for</strong> this type of polymorphism.<br />

To introduce the applicability of this kind of polymorphism an example from the topological<br />

area is given. Classes <strong>for</strong> different kinds of points are used, which should be comparable in their<br />

own set. Traversing through containers or data structures is a quite common task in generic<br />

programming. The next code snippet presents the base class <strong>for</strong> all kind of vertices.<br />

#include<br />

class topology { };<br />

class vertex<br />

{<br />

public:<br />

virtual bool equal(const vertex∗ ve) const = 0;<br />

};<br />

If these vertex types have to be extended, only the new class with the according equal method<br />

should be implemented. The next code snippet presents two possible implementations <strong>for</strong> a<br />

vertex, which can be used in different topologies.<br />

class structured vertex : public vertex<br />

{<br />

public:<br />

structured vertex(int id, topology∗ topo) : id(id), topo(topo) {}<br />

virtual bool equal(const vertex∗ ve) const<br />

{<br />

const structured vertex∗ sv = dynamic cast(ve);<br />

return ((id == sv→ id) && (topo == sv→ topo));<br />

}<br />

protected:<br />

int id;<br />

topology∗ topo;<br />

};<br />

3 Also called inclusion polymorphism.


7.5. FROM MONOMORPHIC TO POLYMORPHIC BEHAVIOR 215<br />

class unstructured vertex : public vertex<br />

{<br />

public:<br />

unstructured vertex(int handle, topology∗ topo, int segment) : handle(handle), segment(segment), topo(topo) {}<br />

virtual bool equal(const vertex∗ ve) const<br />

{<br />

const unstructured vertex∗ sv = dynamic cast(ve);<br />

return handle == (( sv→ handle ) && ( topo == sv→ topo) && (segment == sv→ segment));<br />

}<br />

protected:<br />

int handle;<br />

int segment;<br />

topology∗ topo;<br />

};<br />

With this virtual class hierarchy, an algorithm which operates on all different classes derived<br />

from vertex can be written. This is called explicit interface.<br />

void print equal(const vertex∗ ve1, const vertex∗ ve2)<br />

{<br />

std::cout ≪ std::boolalpha ≪ ve1→ equal(ve2) ≪ std::endl;<br />

}<br />

The next code lines present the generic behavior of the algorithm, which operators on both<br />

types derived from vertex.<br />

int main()<br />

{<br />

topology the topo;<br />

vertex∗ the vertex1;<br />

vertex∗ the vertex2;<br />

}<br />

// ∗∗∗ structured<br />

the vertex1 = new structured vertex(12, &the topo);<br />

the vertex2 = new structured vertex(12, &the topo);<br />

print equal(the vertex1, the vertex2);<br />

// ∗∗∗ unstructured<br />

the vertex1 = new unstructured vertex(12, &the topo, 1);<br />

the vertex2 = new unstructured vertex(12, &the topo, 2);<br />

print equal(the vertex1, the vertex2);<br />

return 0;<br />

As can be seen, polymorphic behavior can be achieved, but with major drawbacks. First,


216 CHAPTER 7. EFFECTIVE PROGRAMMING: THE POLYMORPHIC WAY<br />

pointers or references to the objects have to be used, which eliminates the possibility <strong>for</strong> a<br />

compiler to optimize some parts of the code, i.e. inlining. Second, a dynamic cast has to be<br />

used which can cause an exception at run time. This kind of problem is called binary-methodproblem,<br />

which is explained in Section 7.5.3<br />

Nevertheless, dynamic polymorphism in <strong>C++</strong> is best at:<br />

• Uni<strong>for</strong>m manipulation based on base/derived class relationships: Different classes that<br />

hold a base/derived relationship can be treated uni<strong>for</strong>mly.<br />

• Static type checking: All types are checked statically in <strong>C++</strong>.<br />

• Dynamic binding and separate compilation: Code that uses classes in a hierarchy can<br />

be compiled apart from the code of the entire hierarchy. This is possible because of the<br />

indirection that pointers provide (both to objects and to functions).<br />

• Binary interfacing: Modules can be linked either statically or dynamically, as long as the<br />

linked modules lay out the virtual tables the same way.<br />

Behind the Dynamic Polymorphism in <strong>C++</strong><br />

How virtual functions work:<br />

• Normally when the compiler sees a member function call it simply inserts<br />

instructions calling the appropriate subroutine (as determined by the<br />

type of the pointer or reference)<br />

• However, if the function is virtual a member function call such as<br />

vc→ foo() is replaced with following: (∗((vc→ vtab)[0]))()<br />

• The expression vc→ vtab locates a special ”secret” data member of the<br />

object pointed to by vc. This data member is automatically present in<br />

all objects with at least one virtual function. It points to a class-specific<br />

table of function pointers (known as the class’s vtable)<br />

• The expression (vc→ vtab)[0] locates the first element of the class’s<br />

vtable of the object (the one corresponding to the first virtual function<br />

foo() ). That element is a function pointer to the appropriate foo()<br />

member function.<br />

• Finally, the expression (∗((vc→ vtab)[0]))() dereferences the function<br />

pointer and calls the function<br />

• Special care must be taken with destructors in virtual class hierarchies.<br />

The base class does not know anything about the derived classes and<br />

so the derived class destructor has to be marked with virtual, too.


7.5. FROM MONOMORPHIC TO POLYMORPHIC BEHAVIOR 217<br />

Parametric Polymorphism<br />

Parametric polymorphism was the first type of polymorphism developed, and first identified by<br />

Christopher Strachey in 1967. It was also the first type of polymorphism to appear in an actual<br />

programming language, ML in 1976. It exists in <strong>C++</strong>, Standard ML, Haskell, and others. The<br />

term static polymorphism is often found.<br />

In <strong>C++</strong>, this type of polymorphism can be used via templates and also lets a value have more<br />

than one type. Inside<br />

template double function(T param) {..}<br />

param can have any type that can be substituted inside function to render compilable code.<br />

This is called implicit interface in contrast to a base class’s explicit interface. It achieves the<br />

same goal of polymorphism - writing code that operates on multiple types but in a very different<br />

way.<br />

To tie up to the dynamic polymorphic by example, the same example as in the static polymorphic<br />

world is used through function templates:<br />

#include<br />

class topology<br />

{<br />

// ... temp class<br />

};<br />

class structured vertex<br />

{<br />

public:<br />

structured vertex(int id, topology∗ topo) : id(id), topo(topo) {}<br />

bool equal(const structured vertex& ve) const<br />

{<br />

return id == ve.id && topo == ve.topo;<br />

}<br />

protected:<br />

int id;<br />

topology∗ topo;<br />

};<br />

class unstructured vertex<br />

{<br />

public:<br />

unstructured vertex(int handle, topology∗ topo, int segment) : handle(handle), segment(segment), topo(topo) {}<br />

bool equal(const unstructured vertex& ve) const<br />

{<br />

return handle == ve.handle && topo == ve.topo && segment == ve.segment;<br />

}


218 CHAPTER 7. EFFECTIVE PROGRAMMING: THE POLYMORPHIC WAY<br />

protected:<br />

int handle;<br />

int segment;<br />

topology∗ topo;<br />

};<br />

Here, no class hierarchy is required. It has only be guaranteed, that each data type provides an<br />

implementation of the required method. Below print equal() is written as a function template:<br />

template<br />

void print equal(const VertexType& ve1, const VertexType& ve2)<br />

{<br />

std::cout ≪ std::boolalpha ≪ ve1.equal(ve2) ≪ std::endl;<br />

}<br />

In the code snippet below, the same polymorphic behavior can be seen as in the dynamic<br />

polymorphism example, but without the necessity of inheriting from a common base class.<br />

int main()<br />

{<br />

topology the topo;<br />

}<br />

// ∗∗∗ structured<br />

structured vertex sv1(12, &the topo);<br />

structured vertex sv2(12, &the topo);<br />

print equal(sv1, sv2);<br />

// ∗∗∗ unstructured<br />

unstructured vertex usv1(12, &the topo,1);<br />

unstructured vertex usv2(12, &the topo,2);<br />

print equal(usv1, usv2);<br />

return 0;<br />

Without a pointer mechanisms the compiler can easily optimize these lines, i.e. inline the code.<br />

Additionally exceptions cannot occur at run time.<br />

Due to its characteristics, static polymorphism in <strong>C++</strong> is best at:<br />

• Uni<strong>for</strong>m manipulation based on syntactic and semantic interface: Types that obey a<br />

syntactic and semantic interface can be treated uni<strong>for</strong>mly.


7.5. FROM MONOMORPHIC TO POLYMORPHIC BEHAVIOR 219<br />

• Static type checking: All types are checked statically.<br />

• Static binding (prevents separate compilation):All types are bound statically.<br />

• Efficiency: Compile-time evaluation and static binding allow optimization and efficiencies<br />

not available with dynamic binding.<br />

7.5.3 Comparison of Static and Dynamic Polymorphism<br />

Here the main features from static and dynamic polymorphism are summarized:<br />

• Virtual function calls are slower during run time than function templates: A virtual<br />

function call includes an extra pointer dereference to find the appropriate method in the<br />

virtual table. By itself, this overhead may not be significant. Significant slowdowns can<br />

result in compiled code because the indirection may prevent an optimizing compiler from<br />

inlining the function and from applying subsequent optimizations to the surrounding code<br />

after inlining.<br />

• Run time dispatch versus compile-time dispatch: The run time dispatch of virtual functions<br />

and inheritance is certainly one of the best features of object-oriented programming.<br />

For certain kinds of components, run time dispatching is an absolute requirement, decisions<br />

need to be made based on in<strong>for</strong>mation that is only available at run time. When this<br />

is the case, virtual functions and inheritance are needed.<br />

Templates do not offer run time dispatching, but they do offer significant flexibility at<br />

compile time. In fact, if the dispatching can be per<strong>for</strong>med at compile time, templates offer<br />

more flexibility than inheritance because they do not require the template arguments types<br />

to inherit from some base class.<br />

• Code size: virtual functions are small, templates are big:: A common concern in templatebased<br />

programs is code bloat, which typically results from naive use of templates. Carefully<br />

designed template components need not result in significantly larger executable size<br />

than their inheritance-based counterparts.<br />

• The binary method problem: There is a serious problem that shows up when using inheritance<br />

and virtual functions to express operations that work on two or more objects.


220 CHAPTER 7. EFFECTIVE PROGRAMMING: THE POLYMORPHIC WAY<br />

Note<br />

The binary method problem is encountered when methods in which the<br />

receiver type and argument type should vary together, such as equality<br />

comparisons, must instead use a fixed <strong>for</strong>mal parameter type to<br />

maintain type safety. The problem arises in mainstream object-oriented<br />

languages because only the receiver of a method call is used <strong>for</strong> run time<br />

method selection, and so the argument must be assumed to have the<br />

most general possible type. Existing techniques to solve this problem<br />

require intricate coding patterns that are tedious and error-prone. The<br />

binary method problem is a prototypical example of a larger class of<br />

problems where overriding methods require type in<strong>for</strong>mation <strong>for</strong> their<br />

<strong>for</strong>mal parameters. Another common example of this problem class is<br />

the implementation of event handling (e.g., <strong>for</strong> graphical user interfaces),<br />

where ”callback methods” must respond to a variety of event<br />

types.


7.6. BEST OF BOTH WORLDS 221<br />

7.6 Best of Both Worlds<br />

The object-oriented programming paradigm offers mechanisms to write libraries that are open<br />

<strong>for</strong> extension, but it tends to impose intrusive interface requirements on the types that will be<br />

supplied to the library. The generic programming paradigm has seen much success in <strong>C++</strong>,<br />

partly due to the fact that libraries remain open to extension without imposing the need to<br />

intrusively inherit from particular abstract base classes. However, the static polymorphism that<br />

is a staple of programming with templates and overloads in <strong>C++</strong>, limits generic programming<br />

applicability in application domains where more dynamic polymorphism is required.<br />

In combining elements of object-oriented programming with those of generic programming, we<br />

take generic programming as the starting point, retaining its central ideas. In particular, generic<br />

programming is built upon the notion of value types that are assignable, copy constructible,<br />

The behavior expected from value types reflects that of <strong>C++</strong> built-in types, like int, double,<br />

and so <strong>for</strong>th. This generally assumes that types encapsulate their memory and resource management<br />

into their constructors, copy-constructors, assignment operators, and destructors, so<br />

that objects can be copied, and passed as parameters by copy, etc., without worrying about<br />

references to their resources becoming aliased or becoming dangling. Value types simplify local<br />

reasoning about programs. Explicitly managing objects on the heap and using pass-by-reference<br />

as the parameter passing mode makes <strong>for</strong> complex object ownership management (and object<br />

lifetime management in languages that are not garbage collected). Instead, explicitly visible<br />

mechanisms thin wrapper types like reference wrapper in the (draft) <strong>C++</strong> standard library ae<br />

used when sharing is desired.<br />

.. more to come..<br />

7.6.1 Compile Time Container<br />

7.6.2 Meta-Functions<br />

7.6.3 Run-Time concepts


222 CHAPTER 7. EFFECTIVE PROGRAMMING: THE POLYMORPHIC WAY


Part II<br />

Using C ++<br />

223


Finite World of Computers<br />

Chapter 8<br />

8.1 Mathematical Objects inside the Computer<br />

First natural numbers N are introduced and the data types available in a programming language<br />

used to represent them. The difference between a single digit, and their connection to the used<br />

base is an important concept in computer science.<br />

A number is represented by several single digits, with each digit being a factor <strong>for</strong> a corresponding<br />

power of the base. The number is only complete when both the base and all of the digits are<br />

known. To use an example, the digit sequence 123 is calculated with the corresponding base,<br />

e.g. base = 10:<br />

12310 = 1 · 10 2 + 2 · 10 1 + 3 · 10 0<br />

If the base is switched, e.g. base = 4, then the following numbers are derived:<br />

1234 = 1 · 4 2 + 2 · 4 1 + 3 · 4 0 = 2710<br />

One of the drawbacks of the representation of numbers within the computer is the fact that the<br />

built-in types such as int and long can only use a finite number of bits and are hence limited<br />

in their range (the int can be omitted), e.g.:<br />

short int: -32768 +32767<br />

long int: -2147483648 +2147483647<br />

unsigned long int: 0 +4 294 967 295<br />

As can be seen the maximum number of countable items is restricted. If, as an example, a<br />

program has to count the living humans on earth we have to switch to another number concept,<br />

either floating point or a decimal data type. A plain and simple arbitrary digit number container<br />

can be implemented by:<br />

class big number{<br />

long base;<br />

std::vector digits;<br />

public:<br />

225


226 CHAPTER 8. FINITE WORLD OF COMPUTERS<br />

// .........<br />

};<br />

8.2 More Numbers and Basic Structure<br />

Polynomials are an important and efficient tool <strong>for</strong> numerous fields of science. Due to the simple<br />

rules regarding differentiation and integration polynomials have found wide spread application.<br />

Polynomials can be defined as a weighted sum of exponential terms in at least one variable or<br />

expression, with the exponents being restricted to non-negative whole numbers. Their simple<br />

definition as well as the fact that their algebraic structure is not only closed under addition,<br />

subtraction, and multiplication, but also under differentiation and integration, result in their<br />

widespread application. The demand of additional properties such as, e.g., orthogonality with<br />

respect to an inner product results in special classes of polynomials, orthogonal polynomials,<br />

which further increases their appeal in fields such as finite elements. A polynomial consists of<br />

coefficients (ai) and a variable expression (x i ):<br />

a0 x 0 + a1 x 1 + a2 x 2 + . . . + an x n<br />

Thus a container representation to store the coefficients <strong>for</strong> polynomials was chosen so that a<br />

generic <strong>C++</strong> variable contains the expression:<br />

gsse::polynomial<br />

When storing the coefficients in a container great care has been taken to implement the library<br />

to be generic with respect to the type of the underlying data structure. In this way it is possible<br />

to use compile time containers if the size or even the concrete coefficients are already known at<br />

compile time. This allows the compiler to inline and execute operations at compile time.<br />

The most suitable container to use <strong>for</strong> the coefficients usually depends on the input and not<br />

the algorithms. It is there<strong>for</strong>e important to provide a basic set of programming utilities which<br />

are generic with regard to the used container type. Compile time and run time containers have<br />

a few incompatible requirements which make it hard to define a common set of utilities.<br />

8.2.1 Accessing Coefficients<br />

Accessing a polynomial’s coefficients is an important operation. There exist two basic ways of<br />

accessing the coefficient. Compile time accessors are used when the index of the coefficient to<br />

be accessed is known at compile time, while run time accessors have to be used otherwise. The<br />

compile time version takes the index as a template-parameter, while the run time entity as a<br />

function argument.<br />

namespace compiletime {<br />

template<br />

typename result of::coeff::type<br />

coeff(Polynomial const &p);<br />

}<br />

namespace runtime {


8.2. MORE NUMBERS AND BASIC STRUCTURE 227<br />

template<br />

typename result of::coeff::type<br />

coeff(index type n, Polynomial const &p);<br />

}<br />

Access to the coefficient is then available by:<br />

polynomial p;<br />

compiletime::coeff(p);<br />

runtime::coeff(n, p);<br />

Thus it is possible <strong>for</strong> the compiler to simplify the code and determine more in<strong>for</strong>mation about<br />

the coefficient. There<strong>for</strong>e the compile time version is more flexible than the run time version.<br />

Using inhomogeneous compile time containers in conjunction with the run time accessor is not<br />

possible since it is not possible to determine the return type in advance. This reduces the<br />

flexibility of the code using the run time accessors. A workaround to this problem can be<br />

achieved by using the visitor pattern.<br />

template<br />

void coeff visitor(<br />

index type n,<br />

Polynomial const &p,<br />

Visitor v<br />

);<br />

However, this approach has the disadvantage of being more complicated to use than the coeff<br />

function.<br />

The coefficient accessors are not simple wrappers around the accessors of the underlying container.<br />

They check the access and return a zero value if the container does not contain the<br />

coefficient. The zero value is determined by the coeff trait template class:<br />

template<br />

struct coeff trait<br />

{<br />

typedef CoeffType zero type;<br />

static zero type const<br />

zero value = zero type();<br />

};<br />

By using partial template-specialization it is possible to define the corresponding zero value <strong>for</strong><br />

the correct type. For inhomogeneous polynomials default coeff is passed as CoeffType and<br />

the default behavior is to return an int.<br />

8.2.2 Setting Coefficients<br />

Coefficients may set using the set coeff function. It does not change the given polynomial but<br />

creates a new view instead. This provides the polynomial library with a functional programming<br />

style. Setting the coefficients and changing the polynomial can only be achieved by directly<br />

manipulating the coefficient container.


228 CHAPTER 8. FINITE WORLD OF COMPUTERS<br />

namespace compiletime<br />

{<br />

template<br />

typename result of::set coeff::type<br />

set coeff(Polynomial const &p,<br />

Coeff const &c);<br />

}<br />

namespace runtime<br />

{<br />

template<br />

typename result of::set coeff::type<br />

set coeff(index type n,<br />

Polynomial const &p,<br />

Coeff const &c);<br />

}<br />

Write access is then available by:<br />

polynomial p;<br />

compiletime::set coeff(p, 1);<br />

runtime::set coeff(n, p, 1);<br />

The degree of the polynomial is defined as the maximum degree of all of its terms, where the<br />

degree of a term is given as the sum of the degree of all variables in this term. The polynomial<br />

library defines the degree as the index of the highest non-zero coefficient. To obtain the correct<br />

degree requires to use a polynomial <strong>for</strong> each variable and finally combine them:<br />

struct X;<br />

typedef polynomial<<br />

X,<br />

fusion::map< pair< mpl::int , double> ><br />

> inner poly;<br />

degree � 3 x 4 y 2� = 4 + 2 = 6<br />

typedef polynomial<<br />

Y,<br />

fusion::map< pair< mpl::int , inner poly> ><br />

> the polynomial;<br />

By instantiating the polynomial the calculation of its degree is possible:<br />

the polynomial p;<br />

assert( degree(p) == 6 );


8.2. MORE NUMBERS AND BASIC STRUCTURE 229<br />

8.2.3 Compile Time Programming<br />

The application of meta-programming is presented which utilizes the compiler to execute code<br />

at compile time and then reduce the result of the expressions. As an example the derivative of<br />

a second-degree polynomial is calculated and a second polynomial is added:<br />

d(3 + 4.5 x + 10 x 2 )<br />

dx<br />

+ (1 + 2x) = 5.5 + 22 x<br />

The type list represents the type of each coefficient starting from the zero to the second degree<br />

coefficient.<br />

struct X { } x;<br />

typedef fusion::vector coeffs;<br />

typedef polynomial poly;<br />

poly p(x, coeffs(3.0, 4.5, 10));<br />

typedef result of::diff::type diffed;<br />

diffed d = diff(p, x);<br />

poly q(x, coeffs(1.0, 2.0, 0));<br />

std::cout ≪ coeff(q + d);<br />

By compiling and evaluating the assembler code it is revealed that the calculations were per<strong>for</strong>med<br />

at compile time and the binary only contains the final result of 22.<br />

8.2.4 Arbitrary-Precision Arithmetic<br />

The application of the polynomial library to per<strong>for</strong>m arbitrary-precision arithmetic (or “bignum<br />

arithmetic”) is also presented here. It uses the fact that a number is in essence a polynomial<br />

with a fixed base.<br />

1372 = 1 · 10 3 + 3 · 10 2 + 7 · 10 + 2<br />

This can easily be translated into <strong>C++</strong> code by using the polynomial library. Note, that the<br />

first element in the array is the zero coefficient:<br />

typedef unsigned char byte t;<br />

typedef array coeffs t;<br />

coeffs t coeffs = {{2, 7, 3, 1}};<br />

gsse::polynomial p(coeffs);<br />

Since computer systems usually operate on binary numbers base-2 is the optimal choice. The<br />

difference between polynomial arithmetic and arbitrary-precision arithmetic is that the coefficients<br />

need to be realigned to the base after each operation.


230 CHAPTER 8. FINITE WORLD OF COMPUTERS<br />

8.2.5 Finite Element Integration<br />

In the theory of finite elements [?, ?], a continuous function space is projected onto a finite<br />

function space P k , where the space P k is the space of polynomials up to the total order of k.<br />

For many special cases, finite element integrals can be computed manually and added into<br />

the source code of an application. This results in excellent run time per<strong>for</strong>mance but lacks<br />

flexibility. For more general cases, e.g., general coefficients, they must be computed by numerical<br />

integration at run-time. To prevent an ill-conditioned system matrix, orthogonal polynomials<br />

have to be chosen as numerical integration weights. One possible type of polynomial is a<br />

normalized Legendre polynomial [?]. Coefficients <strong>for</strong> such a polynomial Pk of order k can be<br />

efficiently evaluated by using the recursion procedure:<br />

P0(x) = 1 (8.1)<br />

P1(x) = x<br />

2j − 1<br />

Pk(x) = x Pk−1(x) −<br />

j<br />

j − 1<br />

j Pk−2(x) k ≥ 2<br />

To use arbitrary p-finite elements (polynomial order [?, ?]) the numerical coefficients have to<br />

be calculated either manually and inserted into the source code or determined numerically at<br />

run time.<br />

The polynomial library presented here is then used to store manually pre-calculated integration<br />

tables at compile time (order 1-5). If the user requires higher order finite elements, numerical<br />

coefficients are calculated at run time to any order.<br />

8.3 A Loop and More<br />

One of the important parts in computer science is repetation. A computer was made to do<br />

exactly like this, programmable operations and repetations. To give a simple example, a <strong>for</strong><br />

loop is expressed by:<br />

<strong>for</strong> (long i = 0; i < max counter; ++i)<br />

{}<br />

To give a real application of this concept, integration is used.<br />

� b<br />

a<br />

f(x)dx<br />

Several approximation schemes are also available:<br />

� b<br />

a<br />

� b<br />

a<br />

f(x)dx ≈<br />

f(x)dx ≈<br />

b − a<br />

6<br />

f(a) + f(b)<br />

2<br />

�<br />

f(a) + 4f<br />

· (b − a)<br />

� a + b<br />

2<br />

� �<br />

+ f(b)


8.4. THE OTHER WAY AROUND 231<br />

As can be seen, this is a very coarse approximation, but the main issue persists. The known<br />

continuos integration is not possible inside the computer, but the concept of numerical integration<br />

is possible. This means the constraint of a finite dx is replaced by a ∆x and the � is<br />

replaced by a finite sum �<br />

i=0;i


232 CHAPTER 8. FINITE WORLD OF COMPUTERS


How to Handle Physics on the<br />

Computer<br />

Chapter 9<br />

9.1 Finite Elements<br />

Discretization schemes lead in general to a linear system of equations:<br />

These matrices are typically:<br />

• sparse (there are only few non-zero elements per row)<br />

• large dimension N (10 4 − 10 9 unknowns)<br />

—xx<br />

A x = f (9.1)<br />

The non-zero elements of the matrix Ai,j represent a finite element with both degrees of freedom<br />

i and j connected.<br />

To demonstrate the transfer of a continuous <strong>for</strong>mulated equation such as the Laplace or Poisson<br />

equation to the finite regime of a computer, a simple Dirichlet problem is used. If an implicit<br />

(uni<strong>for</strong>m) 1D-grid with n elements is used, the contribution of each element to the system<br />

matrix A is constant, so called stencil sub-matrix.<br />

⎛<br />

⎜<br />

A = ⎜<br />

⎝<br />

2 −1<br />

−1 2 −1<br />

2D implicit grid of dimension N = (n − 1) 2 is:<br />

−1 2 −1<br />

−1 2<br />

233<br />

⎞<br />

⎟<br />

⎠<br />

(n−1)x(n−1)


234 CHAPTER 9. HOW TO HANDLE PHYSICS ON THE COMPUTER<br />

⎛<br />

⎜<br />

A = ⎜<br />

⎝<br />

⎜<br />

D = ⎜<br />

⎝<br />

2 −1<br />

−1 2 −1<br />

⎛<br />

⎜<br />

A = ⎜<br />

⎝<br />

⎛<br />

−1 2 −1<br />

−1 2<br />

D −I<br />

−I D −I<br />

4 −1<br />

−1 4 −1<br />

and the (n − 1)x(n − 1) identity matrix I.<br />

9.2 Again, Integrators<br />

⎞<br />

⎟<br />

⎠<br />

−I D −I<br />

−I D<br />

−1 4 −1<br />

−1 4<br />

⎞<br />

⎟<br />

⎠<br />

(n−1)x(n−1)<br />

⎞<br />

⎟<br />

⎠<br />

(n−1)x(n−1)


Programming tools<br />

Chapter 10<br />

In this chapter we introduce programming tools that can be used to solve the exercises.<br />

10.1 GCC<br />

GCC stands <strong>for</strong> the Gnu Compiler Collection. It is a collection of compilers (C, <strong>C++</strong>, FOR-<br />

TRAN, Fortran 90, java) free of charge [?]. The <strong>C++</strong> compilers are very good and produce<br />

reasonably efficient code. In this section, we explain how to compile a <strong>C++</strong> program.<br />

The following command:<br />

g++ -o hello hello.cpp<br />

compiles the <strong>C++</strong> source file hello.cpp into the executable hello.<br />

The compiler command is gcc or g++ with the following options.<br />

• -Idirectory: Include files directory<br />

• -O: Optimization<br />

• -g: Debugging<br />

• -p: Profiling<br />

• -o filename: output file name<br />

• -c: Compile, no link<br />

• -Ldirectory: Library directory<br />

• -lfile: Link with library libfile.a<br />

Here is another example:<br />

g++ -o foo foo.cpp -I/opt/include -L/opt/lib -lblas<br />

compiles and links the file foo.cpp using include files from /opt/include/ (option -I) and<br />

linked with a library that is situated in the directory /opt/lib: For optimizing code, we have<br />

to use the compilation options :<br />

-O3 -DNDEBUG<br />

235


236 CHAPTER 10. PROGRAMMING TOOLS<br />

The -DNDEBUG option sets the C-preprocessor variable NDEBUG which tells the assert command<br />

that debug tests should not be done. This allows us to save time at execution.<br />

10.2 Debugging<br />

10.2.1 Debugging with text tools<br />

“Et la tu t’dis que c’est fini<br />

car pire que ça ce serait la mort.<br />

Qu’en tu crois enfin que tu t’en sors<br />

quand y en a plus et ben y en a encore!”<br />

— Stromae.<br />

There are several debugging tools. In general, graphical ones are more user friendly, but they<br />

are not always available. In this section, we describe the gdb debugger, which is very useful to<br />

trace the cause of a run time error if the code was compiled with the option -g.<br />

The following contains a printout of a gdb session of the program hello.cpp:<br />

#include <br />

#include <br />

int main() {<br />

glas::dense vector< int > x( 2 ) ;<br />

x(0) = 1 ; x(1) = 2 ;<br />

<strong>for</strong> (int i=0; i


10.2. DEBUGGING 237<br />

T& glas::continuous_dense_vector::operator()(ptrdiff_t) [with T = int]:<br />

Assertion ‘i


238 CHAPTER 10. PROGRAMMING TOOLS<br />

10.2.2 Debugging with graphical interface: DDD<br />

More convenient than debugging on a text level is using a graphical interface like DDD (Data<br />

Display Debugger). It has more or less the same functionality as gdb and in fact it runs gdb<br />

internally. One can use it also with another text debugger.<br />

As case study, we use a modified example from Section 5.4.5. In fact, the buggy program arose<br />

by teaching § 5.4.5, i.e. one of the authors tried to reconstruct vector unroll example2.cpp<br />

on the fly.<br />

TODO: Find a better example. The above finally was okay, the tuning just did not change the<br />

run-time behaviour.<br />

In addition to the window above you will see a smaller one like in Figure 10.1, typically on the<br />

right of the large window if there is enough space on your screen.<br />

This control panel let you geer through the debug session in way that is<br />

easier <strong>for</strong> beginner and even <strong>for</strong> some advanced users more convenient.<br />

You have the following command:<br />

Run Start or restart your program.<br />

Interrupt If your program does not terminate or does not reach the next<br />

break point you can stop it manually.<br />

Step Go one step <strong>for</strong>ward. If your position is a function call, jump into<br />

the function.<br />

Next Go to the next line in your source code. If you are located on a<br />

function call do not jump into it unless there is a break point set<br />

inside.<br />

Figure 10.1: DDD<br />

control panel


10.3. VALGRIND 239<br />

Stepi and Nexti This are the equivalents on instruction level. This is<br />

only needed <strong>for</strong> debugging assembler code and not subject in this<br />

book.<br />

Until Position your cursor in your source and run the program until you<br />

reach this line. If your program flow do not pass this line the execution<br />

will continued till the end, the next break point or bug.<br />

Finish Execute the remainder of the current function and stop in the first<br />

line outside this function, i.e. the line after the function call.<br />

Cont Continue your execution till the next event (break point, bug, or<br />

end).<br />

Kill the program.<br />

Up Show the line of the current function’s call, i.e. go up one level in the<br />

call stack.<br />

Down Go back to the called function, i.e. go down one level in the call<br />

stack.<br />

Undo Revert last action (works rarely or never).<br />

Redo Repeat the last command.<br />

Edit Call an editor with the source file currently shown.<br />

Make Call ‘make’ (which must know what to compile).<br />

10.3 Valgrind<br />

The valgrind distribution offers several tools that you can use to analyze your software. We will<br />

only use one of these tools called memcheck. For more in<strong>for</strong>mation on the others we refer you<br />

to http://valgrind.org Memcheck detects memory-management problems like memory leaks.<br />

Memcheck also reports if your program accesses memory it should not or if it uses uninitialized<br />

values. All these errors are reported as soon as they occur along with the corresponding source<br />

line number at which they occurred and also a stack trace of the functions called to reach<br />

that line. You should also take into account that Memcheck runs programs about 10 to 30<br />

times slower than normal. Use the following command to check the memory management of a<br />

program:<br />

valgrind −−tool=memcheck program name<br />

10.4 Gnuplot<br />

A useful tool <strong>for</strong> making plots is Gnuplot. It is a public domain program.<br />

Invoke gnuplot to start the program. Suppose we have the file results with the following<br />

content:


240 CHAPTER 10. PROGRAMMING TOOLS<br />

0 1<br />

0.25 0.968713<br />

0.75 0.740851<br />

1.25 0.401059<br />

1.75 0.0953422<br />

2.25 -0.110732<br />

2.75 -0.215106<br />

3.25 -0.237847<br />

3.75 -0.205626<br />

4.25 -0.145718<br />

4.75 -0.0807886<br />

5.25 -0.0256738<br />

5.75 0.0127226<br />

6.25 0.0335624<br />

6.75 0.0397399<br />

7.25 0.0358296<br />

7.75 0.0265507<br />

8.25 0.0158041<br />

8.75 0.00623965<br />

9.25 -0.000763948<br />

9.75 -0.00486465<br />

plot "results" w l plot "results"<br />

The first column represents the x coordinate and the second colum contains the corresponding<br />

y coordinate values. We can plot this using the command:<br />

plot "results" w l<br />

The command<br />

plot "results"<br />

only plots stars, no line. The command help is also useful. For 3D plots, i.e. a table with three<br />

columns, we use the command splot.<br />

10.5 Unix and Linux<br />

Unix (and Linux) are not used as often as Windows plat<strong>for</strong>ms, although <strong>for</strong> scientific programming<br />

they are popular development plat<strong>for</strong>ms. The Unix operating system is a command line<br />

system with several graphical interfaces. Especially in Linux, the graphical interfaces are well<br />

developed so that you get a windows like look and feel. Although you can easily browse through<br />

the directories, create new directories and move data around with a few mouse clicks, it may<br />

be interesting to know at least a few Unix commands:<br />

• ps: list of my processes,<br />

• kill -9 id : kill the process with id id,


10.5. UNIX AND LINUX 241<br />

• top: list all processes and resource use,<br />

• mkdir: make a new directory,<br />

• rmdir: remove an (empty) directory,<br />

• pwd: name of the current directory,<br />

• cd dir: change directory to dir,<br />

• ls: list the files in the current directory<br />

• cp from to: copy the file from to the file or directory to. if the file to exists, it is<br />

overwritten, unless you use cp -i from to,<br />

• mv from to: move the file from to the file or directory to. If the file to exists, it is<br />

overwritten, unless you use mv -i from to,<br />

• rm files: remove all the files in the list files. rm * removes everything (be careful) chmod<br />

mode files : change the user mode <strong>for</strong> files.<br />

See http://www.physics.wm.edu/unix_intro/outline.html <strong>for</strong> on-line help.


242 CHAPTER 10. PROGRAMMING TOOLS


C ++ Libraries <strong>for</strong> Scientific Computing<br />

Chapter 11<br />

TODO: Introducing words.<br />

11.1 GLAS: Generic Linear Algebra Software<br />

11.1.1 Introduction<br />

Software kernels <strong>for</strong> dense and sparse linear algebra have been developed over many decades.<br />

The development of the BLAS [?] [?] [?] [?] [?] in FORTRAN and later the similar work in<br />

<strong>C++</strong>, see MTL [?], Blitz++, to name a few.<br />

Currently, more and more scientific software is written in <strong>C++</strong>, but the language does not<br />

provide us with dense and sparse vector and matrix concepts and algorithms, as this is the<br />

case <strong>for</strong> Matlab. This makes exchanging <strong>C++</strong> software harder than, <strong>for</strong> example, Fortran 90<br />

software, which has dense vector and matrix concepts defined in the language. Note that<br />

Fortran 90 does not have sparse and structured matrix types such as symmetric or upper<br />

triangular, or banded matrices.<br />

11.1.2 Goal<br />

The goal of the GLAS project is to open the discussion on standardization <strong>for</strong> <strong>C++</strong> programming.<br />

The goal is not to present a standard as such, but may be a first step to achieve this<br />

goal.<br />

We realize that this is very ambitious. We think, the GLAS proposal meets the goals, but the<br />

internals are still rather complicated, which makes extensions less straight<strong>for</strong>ward. GLAS is a<br />

generic software package using advanced meta programming tools such as the Boost MPL, but<br />

this is invisible to the user who does not want to add extensions to GLAS. A minor knowledge<br />

about template programming and expression templates is required <strong>for</strong> making proper use of the<br />

software.<br />

This version does not use Concept <strong>C++</strong>, since we have encountered instability problems with<br />

the Concept-GCC compiler and found it hard to work with expression-templates.<br />

243


244 CHAPTER 11. <strong>C++</strong> LIBRARIES FOR SCIENTIFIC COMPUTING<br />

We now briefly explain how the goals are met be<strong>for</strong>e entering a more detailed discussion of the<br />

software design.<br />

The GLAS should be considered as an interface to other software <strong>for</strong> linear algebra, e.g. the<br />

BLAS, MTL, or other linear algebra software. Such interface is provided by the Back-ends,<br />

whereas the syntax <strong>for</strong> using such backends does not change. For example, if we want to add a<br />

scaled vector to another vector (an axpy), then we write<br />

y + = a ∗ x ;<br />

but the implementation can use the BLAS (e.g. daxpy), or MTL, or another package. We have<br />

provided a reference <strong>C++</strong> implementation, that is an illustration of how the expressions are<br />

dispatched to the actual implementation.<br />

The concepts mainly contain free functions and meta functions, so that external objects can be<br />

used in GLAS provided these functions are specialized. As an exercise, we show how this can<br />

be done <strong>for</strong> an std::vector.<br />

For more in<strong>for</strong>mation, see [?].<br />

11.1.3 Status<br />

GLAS is still under development. Currently, there are features <strong>for</strong> working with dense vectors<br />

and matrices, and sparse matrices. There is support to the Boost.Sandbox.Bindings and Toolboxes<br />

<strong>for</strong> working with LAPACK, Structured Matrices (mase toolbox), and iterative methods<br />

(iterative toolbox).<br />

11.2 Boost<br />

Boost is a bit out of line in this chapter. Firstly, it is not a library itself but a whole collection<br />

of freely available C ++ libraries. Secondly, not all of the contained libraries deal directly with<br />

scientific computing. However, many of the “non-scientific” libraries provide useful functionality<br />

<strong>for</strong> scientific libraries and applications.<br />

Boost provides free portable <strong>C++</strong> libraries.<br />

Currently, the following Boost libraries are available that are useful <strong>for</strong> numerical software:<br />

• Data structures<br />

– tuple: pairs, triples, etc, e.g. tuple<br />

– smart ptr: smart pointers<br />

• Correctness and testing<br />

– static assert: compile time assertions<br />

• Template programming<br />

– enable if, mpl, type traits<br />

– static assert: compile time assertions


11.3. BOOST.BINDINGS 245<br />

• Math and numerics<br />

– numeric::conversions: conversions of types<br />

– thread: multi-threading<br />

– bindings: generic bindings to external software<br />

– graph: graph programs<br />

– integer: integer types<br />

– interval: interval arithmetic<br />

– random: random number generator<br />

– rational: rational numbers<br />

– math: various mathematical things, e.g. greatest common divisor<br />

– typeof: type deduction<br />

– numeric::ublas: vector and matrix library<br />

– math::quaternion, math::octonian<br />

– math::special functions<br />

• Miscellaneous<br />

– filesystem: advanced operations on files, directories<br />

– program options: working with command line options in your<br />

– timer: timing class<br />

For more in<strong>for</strong>mation on these and other boost libraries see http://www.boost.org.<br />

11.3 Boost.Bindings<br />

Scientific programmers using <strong>C++</strong> also want to use the features offered by mature FORTRAN<br />

and C codes such as LAPACK [?], MUMPS [?] [?], SuperLU [?] and UMFPACK [?]. The<br />

programming ef<strong>for</strong>t <strong>for</strong> rewriting these codes in <strong>C++</strong> is very high. It there<strong>for</strong>e makes more<br />

sense to link the codes into <strong>C++</strong> code. Another argument <strong>for</strong> linking with external software is<br />

per<strong>for</strong>mance : the vendor tuned BLAS functions are perhaps the most obvious example.<br />

In the traditional approach, an interface is developed <strong>for</strong> each basic <strong>C++</strong> linear algebra package<br />

and <strong>for</strong> each external linear algebra package. This is illustrated by Figure 11.1. The Boost<br />

bindings adopt the approach of orthogonality between algorithms and data. This orthogonality<br />

is created by traits classes that provide the necessary data to the external software. The vector<br />

traits, <strong>for</strong> example, provide a pointer (or address), size and stride, which can then be used<br />

by e.g. the BLAS function ddot. Each traits class is specialized <strong>for</strong> user defined vector and<br />

matrix packages. This implies that, <strong>for</strong> a new vector or matrix type, the development ef<strong>for</strong>t is<br />

limited to the specialization of the traits classes. Once the traits classes are specialized, BLAS<br />

and LAPACK can be used straightaway. For a new external software package, it is sufficient


246 CHAPTER 11. <strong>C++</strong> LIBRARIES FOR SCIENTIFIC COMPUTING<br />

BLAS LAPACK ATLAS MUMPS . . .<br />

uBLAS<br />

✪✪✪✪✪✪ ✧ ✧✧✧✧✧✧✧✧<br />

✦✦✦✦✦✦✦✦✦✦✦✦✦ MTL GLAS . . .<br />

Figure 11.1: Traditional interfaces between software<br />

BLAS LAPACK<br />

❅<br />

❅❘ ❄<br />

uBLAS<br />

��✒<br />

✻<br />

MTL<br />

ATLAS<br />

❄<br />

Bindings<br />

✻<br />

GLAS<br />

MUMPS<br />

Figure 11.2: Concept of bindings as a generic layer between linear algebra algorithms and vector<br />

and matrix software<br />

to provide a layer that uses the bindings. Figure 11.2 illustrates this philosophy. Note the<br />

difference with Figure 11.1.<br />

11.3.1 Software bindings<br />

We now illustrate how the bindings can be used to interface external software by means of<br />

examples.<br />

BLAS bindings<br />

The BLAS are the Basic Linear Algebra Subroutines [?] [?] [?] [?] [?], whose reference implementation<br />

is available through Netlib 1 . The BLAS are subdivided in three levels : level one contains<br />

vector operations, level two matrix vector operations and level three, matrix operations.<br />

The BLAS bindings in Boost Sandbox contain interfaces to some BLAS functions. Functions<br />

are added on request. The interfaces check the input arguments using the assert command,<br />

which is only compiled when the NDEBUG compile flag is not set. The interfaces are contained<br />

in three files : blas1.hpp, blas2.hpp, and blas3.hpp in the directory boost/numeric/bindings/blas. The<br />

BLAS bindings reside in the namespace boost::numeric::bindings::blas.<br />

The BLAS provide functions <strong>for</strong> vectors and matrices with value type float, double, std::complex,<br />

and std::complex. All matrix containers have ordering type column major t,since the (FOR-<br />

TRAN) BLAS assume column major matrices.<br />

The bindings are illustrated in Figure 11.3 <strong>for</strong> the BLAS subprograms DCOPY, DSCAL, and<br />

DAXPY <strong>for</strong> objects of type std::vector. Note the include files <strong>for</strong> the bindings of the<br />

BLAS-1 subprograms and the include file that contains the specialization of vector traits <strong>for</strong><br />

std::vector.<br />

1 http://www.netlib.org<br />

. . .<br />

❄<br />

✻<br />

. . .<br />

�<br />

�✠


11.3. BOOST.BINDINGS 247<br />

#include <br />

#include <br />

int main() {<br />

std::vector< double > x( 10 ), y( 10 ) ;<br />

// Fill the vector x<br />

...<br />

bindings::blas::copy( x, y ) ;<br />

bindings::blas::scal( 2.0, y ) ;<br />

bindings::blas::axpy( −3.0, x, y ) ;<br />

return 0 ;<br />

}<br />

LAPACK bindings<br />

Figure 11.3: Example <strong>for</strong> BLAS-1 bindings and std::vector bindings traits<br />

Software <strong>for</strong> dense and banded matrices is collected in LAPACK [?]. It is a collection of<br />

FORTRAN routines mainly <strong>for</strong> solving linear systems, and eigenvalue problems, including the<br />

singular value decomposition. As <strong>for</strong> the BLAS, the Boost Sandbox does not contain a full set<br />

of interfaces to LAPACK routines, but only very commonly used subprograms. On request,<br />

more functions are added to the library. The LAPACK bindings reside in the namespace<br />

boost::numeric::bindings::lapack.<br />

Many LAPACK subroutines require auxiliary arrays, which a non-expert user does not wish to<br />

allocate <strong>for</strong> reasons of com<strong>for</strong>t. The interface allows the user to allocate auxiliary vectors using<br />

the templated Boost.Bindings class array.<br />

The LAPACK bindings verify the matrix structure to see whether the routine is the right choice.<br />

It is also checked whether the matrix arguments are column major. Every function’s return<br />

type is int. The return value is the return value of the INFO argument of the corresponding<br />

LAPACK subprogram.<br />

Figure 11.4 shows an example using GLAS.<br />

MUMPS bindings<br />

MUMPS stands <strong>for</strong> Multifrontal Massively Parallel Solver. The first version was a result from<br />

the EU project PARASOL [?, ?, ?]. The software is developed in Fortran 90 and contains a C interface.<br />

The input matrices should be given in coordinate <strong>for</strong>mat, i.e. storage <strong>for</strong>mat=coordinate t<br />

and the index numbering should start from one, i.e. sparse matrix traits::index base==1. We<br />

refer to the MUMPS Users Guide, distributed with the software [?].<br />

The <strong>C++</strong> interface is a generic interface to the respective C structs <strong>for</strong> the different value<br />

types that are available from the MUMPS distribution: float, double, std::complex, and<br />

std::complex. The <strong>C++</strong> bindings also contain functions to set the pointers and sizes<br />

of the parameters in the C struct using the bindings traits classes. An example is given in<br />

Figure 11.5. The sparse matrix is the uBLAS coordinate matrix, which is a sparse matrix in


248 CHAPTER 11. <strong>C++</strong> LIBRARIES FOR SCIENTIFIC COMPUTING<br />

#include <br />

#include <br />

#include <br />

#include <br />

#include <br />

#include <br />

...<br />

int main () {<br />

int n=100;<br />

// Define a real n x n matrix<br />

glas::dense matrix< double > matrix( n, n ) ;<br />

// Define a complex n vector<br />

glas::dense vector< std::complex > eigval( n ) ;<br />

// Fill the matrix<br />

...<br />

// Call LAPACK routine DGEES <strong>for</strong> computing the eigenvalue Schur <strong>for</strong>m.<br />

// We create workspace <strong>for</strong> best per<strong>for</strong>mance.<br />

bindings::lapack::gees( matrix, eigval, bindings::lapack::optimal workspace() ) ;<br />

...<br />

}<br />

Figure 11.4: Example <strong>for</strong> LAPACK bindings and matrix bindings traits


11.4. MATRIX TEMPLATE LIBRARY 249<br />

coordinate <strong>for</strong>mat. The matrix is stored column wise. The template argument 1 indicates that<br />

row and column numbers start from one, which is required <strong>for</strong> the Fortran 90 code MUMPS.<br />

Finally, the last argument indicates that the row and column indices are stored in type int,<br />

which is also a requirement <strong>for</strong> the Fortran 90 interface. The solve consists of three phases :<br />

(1) the analysis phase, which only needs the matrix’ integer data, (2) the factorization phase,<br />

where also the numerical values are required and (3) the solution phase (or backtrans<strong>for</strong>mation),<br />

where the right-hand side vector is passed on. The included files contain the specializations of<br />

the dense matrix and sparse matrix traits <strong>for</strong> uBLAS and the MUMPS bindings.<br />

11.4 Matrix Template Library<br />

11.5 Blitz++<br />

TODO: We can ask Todd to write something himself — Peter<br />

11.6 Graph Libraries<br />

TODO: Few introducing words from Peter<br />

11.6.1 Boost Graph Library<br />

TODO: I can write something about it — Peter<br />

11.6.2 LEDA<br />

LEDA implements advanced container types and combinatorial algorithms, especially graph<br />

algorithms. Containers are parameterized by element type and implementation strategies. Algorithms<br />

in general work only with the data structure of the library itself.<br />

11.7 Geometric Libraries<br />

TODO: Few introducing words from René and Philipp<br />

11.7.1 CGAL<br />

TODO: Ask Sylvain to write something? Or can René and Philipp write it?<br />

CGAL implements generic classes and procedures <strong>for</strong> geometric computing. The data structure<br />

complexity is at a very high level.


250 CHAPTER 11. <strong>C++</strong> LIBRARIES FOR SCIENTIFIC COMPUTING<br />

#include <br />

#include <br />

#include <br />

int main() {<br />

namespace ublas = boost::numeric::ublas ;<br />

namespace mumps = boost::numeric::bindings::mumps ;<br />

...<br />

typedef ublas::coordinate matrix< double, ublas::column major<br />

, 1, ublas::unbounded array<br />

> sparse matrix type ;<br />

sparse matrix type matrix( n, n, nnz ) ;<br />

// Fill the sparse matrix<br />

...<br />

mumps::mumps< sparse matrix type > mumps solver ;<br />

// Analysis (Set the pointer and sizes of the integer data of the matrix)<br />

matrix integer data( mumps solver, matrix ) ;<br />

mumps solver.job = 1 ;<br />

driver( mumps solver ) ;<br />

// Factorization (Set the pointer <strong>for</strong> the values of the matrix)<br />

matrix value data( mumps solver, matrix ) ;<br />

mumps solver.job = 2 ;<br />

driver( mumps solver ) ;<br />

// Set the right−hand side<br />

ublas::vector v( 10 ) ;<br />

...<br />

// Solve (set pointer and size <strong>for</strong> the right−hand side vector)<br />

rhs sol value data( mumps solver, v ) ;<br />

mumps solver.job = 3 ;<br />

mumps::driver( mumps solver ) ;<br />

return 0 ;<br />

}<br />

Figure 11.5: Example of the use of the MUMPS bindings


11.7. GEOMETRIC LIBRARIES 251<br />

11.7.2 GrAL<br />

TODO: René and Philipp write more?<br />

GrAL implements some concepts like GSSE, but without the generalization of function objects,<br />

the three-layer-concept (segment,domain,structure), generalized quantity storage, and<br />

n-dimensional structured grid.


252 CHAPTER 11. <strong>C++</strong> LIBRARIES FOR SCIENTIFIC COMPUTING


Real-World Programming<br />

Chapter 12<br />

12.1 Transcending Legacy Applications<br />

Legacy application has been written in plain ANSI C or are available as Fortran libraries. It is<br />

there<strong>for</strong>e highly desirable to rejuvenate the implementation which is already available so that<br />

it utilizes advanced technologies and techniques while at the same time keeping as much of<br />

the already obtained experience and trust related to the original code base. One approach<br />

<strong>for</strong> the transition is possible by an evolutionary fashion initially including as much of the old<br />

implementation as possible and gradually replacing it to bring it up to date.<br />

The following examples are based on a particle simulator, where to important concepts can be<br />

separated: scattering mechanisms (physical behaviour of particles at boundaries (TODO: PS))<br />

and physical model descriptions (how particles interact (TODO: PS) ). All available scattering<br />

mechanisms are implemented as individual functions, which are called subsequently. The scattering<br />

models require a variable set of parameters, which leads to non-homogeneous interfaces<br />

in the functions representing them. To alleviate this to some extent global variables have been<br />

employed completely eliminating any aspirations of data encapsulation and posing a serious<br />

problem <strong>for</strong> attempts <strong>for</strong> parallelization to take advantage of modern multi-core CPUs. The<br />

code has the very simple and repetitive structure:<br />

double sum = 0;<br />

double current rate = generate random number();<br />

if (A key == on)<br />

{<br />

sum = A rate(state, parameters);<br />

if (current rate < sum)<br />

{<br />

counter→ A[state→ valley]++;<br />

state after A (st, p);<br />

return;<br />

}<br />

}<br />

sum += B rate (state, state 2, parameters);<br />

253


254 CHAPTER 12. REAL-WORLD PROGRAMMING<br />

if (current rate < sum)<br />

{<br />

counter→ B[state→ valley]++;<br />

state after B (state, state 2);<br />

return;<br />

}<br />

...<br />

Extensions to this code are usually accomplished by copy and paste, which is prone to simple<br />

mistakes by oversight, such as failing to change the counter which has to be incremented or<br />

calling the incorrect function to update the electron’s state.<br />

Furthermore, at times the need arises to calculate the sum of all the scattering models (λtotal),<br />

which is accomplished in a different part of the implementation, thus further opening the possibility<br />

<strong>for</strong> inconsistencies between the two code paths.<br />

The decision which models to evaluate is done strictly at run time and it would require significant,<br />

if simple modification of the code to change this at compile time, thus making highly<br />

optimized specializations very cumbersome.<br />

The functions calculating the rates and state transitions, however, have been well tested and<br />

verified, so that abandoning them would be wasteful.<br />

12.1.1 Best of Both Worlds<br />

Scientific computing requires not only high per<strong>for</strong>mance components evaluated and optimized<br />

at compile-time, but also runtime exchangeable (physical) models and the ability to cope with<br />

various boundary conditions. The two most commonly used programming paradigms, object<br />

oriented and generic programming, differ in how the required functionality is implemented.<br />

Object oriented programming directly offers runtime polymorphism by means of virtual inheritance.<br />

Un<strong>for</strong>tunately current implementations of inheritance use an intrusive approach <strong>for</strong> new<br />

software components and tightly couples a type and the corresponding operations to the super<br />

type. In contrast to object-oriented programming, generic programming is limited to algorithms<br />

using statically and homogeneously typed containers but offers highly flexible, reusable,<br />

and optimizeable software components.<br />

As can be seen, both programming types offer different points of evaluation. runtime-polymorphism<br />

based on concepts [?] (runtime concepts) tries to combine the virtual inheritance runtime modification<br />

mechanism and the compile-time flexibility and optimization.<br />

Inheritance in the context of runtime polymorphism is used to provide an interface template<br />

to model the required concept where the derived class must provide the implementation of the<br />

given interface. The following code snippet<br />

template struct scatter facade<br />

{<br />

typedef StateT state type;<br />

struct scattering concept<br />

{<br />

virtual ∼scattering concept() {} ;<br />

virtual numeric type rate(const state type& input) const = 0;


12.1. TRANSCENDING LEGACY APPLICATIONS 255<br />

virtual void transition(state type& input) = 0;<br />

};<br />

boost::shared ptr scattering object;<br />

template struct scattering model:scattering concept<br />

{<br />

T scattering instance;<br />

scattering model(const T& x):scattering instance(x) {}<br />

numeric type rate(const state type& input) const ;<br />

void transition(state type& input) ;<br />

};<br />

numeric type rate(const state type& input) const;<br />

void transition(state type& input) ;<br />

template <br />

scatter facade(const T& x):scattering object(new scattering model(x)){}<br />

∼scatter facade() {}<br />

};<br />

there<strong>for</strong>e introduces a scattering facade which wraps a scattering concept part. The<br />

virtual inheritance is used to configure the necessary interface parts, in this case rate() and<br />

transition(), which have to be implemented by any scattering model. In the given example<br />

the state type is still available <strong>for</strong> explicit parametrization.<br />

In contrast to other applications of runtime concepts, e.g. in computer graphics, it is not<br />

necessary to provide mechanisms <strong>for</strong> deep copies, as the actual physical models remain unaltered<br />

once they have been created and would only serve unnecessarily increase the memory footprint.<br />

There<strong>for</strong>e a boost::shared ptr is used <strong>for</strong> memory management.<br />

The legacy application has been writte in plain ANSI C, which makes it easily compatible with<br />

the new C ++ implementation. Several design decisions, such as the use of global and static<br />

variables, make it difficult to extend and update appropriately <strong>for</strong> modern multi-core CPUs.<br />

To interface this novel approach a core structure is implemented which wraps the implementations<br />

of the scattering models by using runtime concepts.<br />

template<br />

struct scattering rate A<br />

{<br />

...<br />

const ParameterType& parameters;<br />

scattering rate A(const ParameterType& parameters):parameters(parameters){}<br />

template <br />

numeric type<br />

operator() (const StateType& state) const<br />

{<br />

return A rate(state, parameters);<br />

}<br />

};


256 CHAPTER 12. REAL-WORLD PROGRAMMING<br />

By supplying the required parameters at construction time it is possible to homogenize the<br />

interface of the operator(). This methodology also allows the continued use of the old data<br />

structures in the initial phases of transition, while not being so constrictive as to hamper future<br />

developments.<br />

The functions <strong>for</strong> the state transitions are treated similarly to those <strong>for</strong> the rate calculation.<br />

Both are then fused in a scattering pack to <strong>for</strong>m the complete scattering model and to ensure<br />

consistency of the rate and state transition calculations and which also models the runtime<br />

concept as can be seen in the following part of code:<br />

template<br />

struct scattering pack<br />

{<br />

// ...<br />

scattering rate type rate calculation;<br />

transition type state transition;<br />

scattering pack (const parameter type& parameters) :<br />

rate calculation(parameters),<br />

state transition(parameters)<br />

{}<br />

template<br />

numeric type rate(const StateType& state) const<br />

{<br />

return rate calculation(state);<br />

}<br />

template<br />

void transition(StateType& state)<br />

{<br />

state transition(state);<br />

}<br />

}<br />

The blend of runtime and compile time mechanisms allows the storage of all scattering models<br />

within a single container, e.g. std::vector, which can be iterated over in order to evaluate<br />

them.<br />

typedef std::vector scatter container type ;<br />

scatter container type scatter container ;<br />

scatter container.push back(scattering model) ;<br />

For the development of new collision models easy extendability, even without recompilations,<br />

is also a highly important issue. This approach allows the addition of scattering models at<br />

runtime and to expose an interface to an interpreted language such as, e.g., Python [?].<br />

In case a highly optimized version is desired, the runtime container (here the std::vector) may<br />

be exchanged by a compile time container, which is also readily available from the GSSE and<br />

provides the compiler with further opportunities <strong>for</strong> optimizations at the expense of runtime<br />

adaptability.


12.1. TRANSCENDING LEGACY APPLICATIONS 257<br />

12.1.2 Reuse Something Appropriate<br />

While the described approach initially slightly increases the burden of implementation, due<br />

to the fact, that wrappers need to be provided, it gives a transition path to integrate legacy<br />

codes into an up to date frame while at the same time not to abandoning the experience<br />

associated with it. The invested ef<strong>for</strong>t allows to raise the level of abstraction, which in turn<br />

allows to increase the benefits obtained from the advances in compiler technologies. This in<br />

turn inherently allows an optimization <strong>for</strong> several plat<strong>for</strong>ms without the need <strong>for</strong> massive human<br />

ef<strong>for</strong>t, which was needed in previous approaches.<br />

In this particular case, encapsulating the reliance on global variables of the functions implementing<br />

the scattering models to the wrapping structures, parallelization ef<strong>for</strong>ts are greatly<br />

facilitated, which are increasingly important with the continued increase of computing cores<br />

per CPU.<br />

Furthermore the results can easily be verified as code parts a gradually moved to newer implementations,<br />

the only stringent requirement being link compatibility with C ++. This test and<br />

verification can be taken a step further in case the original implementation is written in ANSI<br />

C, due to the high compatibility of it to C ++. It is possible to weave parts of the new implementation<br />

into the older code. Providing the opportunity to get very a fine grained comparison<br />

not only of final results, but of all the intermediates as well.<br />

Such swift verification of implementations allows to also speed up the steps necessary to verify<br />

calculated results with subsequent or contemporary experiments, which should not be neglected,<br />

in order to keep physical models and their numerical representations strongly rooted in reality.


258 CHAPTER 12. REAL-WORLD PROGRAMMING


Parallelism<br />

Chapter 13<br />

13.1 Multi-Threading<br />

To do!<br />

13.2 Message Passing<br />

13.2.1 Traditional Message Passing<br />

Parallel hello world<br />

#include <br />

#include <br />

int main (int argc, char∗ argv[])<br />

{<br />

MPI Init(&argc, &argv);<br />

std::cout ≪ ”Hello, World!\n”;<br />

MPI Finalize();<br />

}<br />

return 0 ;<br />

#include <br />

#include <br />

int main (int argc, char∗ argv[])<br />

{<br />

MPI Init(&argc, &argv);<br />

int myrank, nprocs;<br />

MPI Comm rank(MPI COMM WORLD, &myrank);<br />

MPI Comm size(MPI COMM WORLD, &nprocs);<br />

std::cout ≪ ”Hello world, I am process number ” ≪ myrank ≪ ” out of ” ≪ nprocs ≪ ”.\n”;<br />

259


260 CHAPTER 13. PARALLELISM<br />

}<br />

MPI Finalize();<br />

return 0 ;<br />

13.2.2 Generic Message Passing<br />

Everybody sends to process number 0.<br />

#include <br />

#include <br />

#include <br />

int main (int argc, char∗ argv[])<br />

{<br />

MPI Init(&argc, &argv);<br />

}<br />

int myrank, nprocs;<br />

MPI Comm rank(MPI COMM WORLD, &myrank);<br />

MPI Comm size(MPI COMM WORLD, &nprocs);<br />

float vec[2];<br />

vec[0]= 2∗myrank; vec[1]= vec[0]+1;<br />

// Local accumulation<br />

float local= std::abs(vec[0]) + std::abs(vec[1]);<br />

// Global accumulation<br />

float global= 0.0f;<br />

MPI Status st;<br />

// Receive from predecessor<br />

if (myrank > 0)<br />

MPI Recv(&global, 1, MPI FLOAT, myrank−1, 387, MPI COMM WORLD, &st);<br />

// Increment<br />

global+= local;<br />

// Send to successor<br />

if (myrank+1 < nprocs)<br />

MPI Send(&global, 1, MPI FLOAT, myrank+1, 387, MPI COMM WORLD);<br />

else<br />

std::cout ≪ ”Hello, I am the last process and I know that |v| 1 is ” ≪ global ≪ ”.\n”;<br />

MPI Finalize();<br />

return 0 ;<br />

low abstraction level<br />

The library per<strong>for</strong>ms the reduction.<br />

#include <br />

#include <br />

#include


13.2. MESSAGE PASSING 261<br />

int main (int argc, char∗ argv[])<br />

{<br />

MPI Init(&argc, &argv);<br />

}<br />

Because:<br />

int myrank, nprocs;<br />

MPI Comm rank(MPI COMM WORLD, &myrank);<br />

MPI Comm size(MPI COMM WORLD, &nprocs);<br />

float vec[2];<br />

vec[0]= 2∗myrank; vec[1]= vec[0]+1;<br />

// Local accumulation<br />

float local= std::abs(vec[0]) + std::abs(vec[1]);<br />

// Global accumulation<br />

float global;<br />

MPI Allreduce (&local, &global, 1, MPI FLOAT, MPI SUM, MPI COMM WORLD);<br />

std::cout ≪ ”Hello, I am process ” ≪ myrank ≪ ” and I know too that |v| 1 is ” ≪ global ≪ ”.\n”;<br />

MPI Finalize();<br />

return 0 ;<br />

• Higher abstraction:<br />

• MPI implementation usually adapted the underlying hardware: typically logarithmic ef<strong>for</strong>t;<br />

can be tuned in assember <strong>for</strong> network card


262 CHAPTER 13. PARALLELISM


Numerical exercises<br />

Chapter 14<br />

In this chapter, we list a number of exercises where the different aspects discussed in the course<br />

will be used. The goal is to implement a small application program in <strong>C++</strong>, run it and interpret<br />

the results.<br />

You can use any software that may help you with your task. A list of packages is provided<br />

at the end of this chapter. We have only installed Boost, Boost.Sandbox, GLAS, BLAS, and,<br />

LAPACK. Other smaller packages could be downloaded if necessary.<br />

In each exercise, a generic function or class will be developed, and its documentation. These<br />

functions and classes should be part of the namespace athens. The functions arguments will<br />

have to be described. Each template argument will have to satisfy concepts. You may have<br />

to define new concepts. If you are using STL or GLAS concepts, you can just refer to them,<br />

without definition.<br />

Write a small paper on the decisions you made <strong>for</strong> the development of the software. Use the<br />

software <strong>for</strong> some examples and report the results. You may write the report on paper or send<br />

it in electronic <strong>for</strong>m (PDF by preference).<br />

14.1 Computing an eigenfunction of the Poisson equation<br />

This is an example of a more complicated problem. It illustrates what is expected from the<br />

exercises. The actual exercises are less demanding.<br />

In this section, we derive software <strong>for</strong> the solution of the Poisson equation. We start with the<br />

1D problem and then move to the 2D problem.<br />

14.1.1 The 1D Poisson equation<br />

The 1D Poisson equation is<br />

− d2u = f (14.1)<br />

dx2 where u(x) is the solution and f the excitation and x ∈ [0, 1]. We impose the boundary<br />

conditions<br />

u(0) = u(1) = 0 .<br />

263


264 CHAPTER 14. NUMERICAL EXERCISES<br />

This is called a boundary value problem.<br />

The goal is to compute the solution u <strong>for</strong> all x ∈ [0, 1]. Since this is not possible numerically, we<br />

only compute u <strong>for</strong> a discrete number of x’s, which we call discretization points. We discretize x<br />

as xj = jh <strong>for</strong> j = 0, . . . , n+1 and h = 1/(n+1). This is called an equidistant distribution. The<br />

smaller h, the closer we are to the continuous problem, i.e. we have more points in [0, 1], but, as<br />

we shall see, the problem becomes more expensive to solve. One method <strong>for</strong> solving boundary<br />

value problems is to replace the derivative by finite differences. We use finite differences <strong>for</strong> the<br />

second order derivatives:<br />

Filling this in (14.1), we obtain<br />

d 2 u<br />

dx 2 (xj) ≈ 1<br />

h 2 (−2u(xj) + u(xj−1) + u(xj+1)) .<br />

1<br />

h 2 (−u(xi−1) − u (xi+1) + 2u(xi)) = f(xi) <strong>for</strong> j = 1, . . . , n . (14.2)<br />

Note that u(x0) = u(xn+1) = 0. Now define the vectors<br />

u = [u(x1), . . . , u(xn)] T<br />

and f = [f(x1), . . . , f(xn)] T .<br />

Putting together (14.2) <strong>for</strong> j = 1, . . . , n leads to the algebraic system of equations Au = f with<br />

n rows and columns where ⎡<br />

2<br />

⎢ −1<br />

A = ⎢<br />

⎣<br />

−1<br />

2<br />

. ..<br />

−1<br />

. .. . ..<br />

⎤<br />

⎥<br />

⎦<br />

−1 2<br />

.<br />

Note that A is a symmetric tridiagonal matrix. We can show that it is positive definite.<br />

In the algorithms, we need operations on this matrix. We will use two different types of<br />

operations. The first one is the matrix-vector product y = Ax. We write a function <strong>for</strong> this<br />

with a template argument <strong>for</strong> the vectors since we do not know be<strong>for</strong>ehand what the type of<br />

the vectors will be.<br />

#ifndef athens poisson 1d hpp<br />

#define athens poisson 1d hpp<br />

#include <br />

#include <br />

namespace athens {<br />

template <br />

void poisson 1d( X const& x, Y& y ) {<br />

assert( glas::size(x)==glas::size(y) ) ;<br />

assert( glas::size(x) > 1 ) ;<br />

y(0) = 2.0∗x(0) − x(1) ;<br />

<strong>for</strong> ( int i=1; i


14.1. COMPUTING AN EIGENFUNCTION OF THE POISSON EQUATION 265<br />

} // namespace athens<br />

#endif<br />

where we assume that the types X and Y are models of the concept glas::DenseVectorCollection.<br />

14.1.2 Richardson iteration<br />

Richardson iteration is an iterative method <strong>for</strong> the solution of the linear system<br />

Bu = g<br />

that starts from an initial guess u0 and computes ui = ui−1 + ri−1 at iteration i, where ri−1 is<br />

the residual g − Bui−1. It works as follows: The method converges when the eigenvalues of B<br />

1<br />

2<br />

3<br />

4<br />

5<br />

1. For i = 1, . . . , max it:<br />

1.1. Compute residual ri−1 = g − Bui−1<br />

1.2. If �ri−1�2 ≤ τ: return<br />

1.3. Compute new solution ui = ui−1 + ri−1<br />

lie between 0 and 2.<br />

The eigenvalues of the Poisson matrix A are λj = 2(1 − cos(πj/(n + 1))) <strong>for</strong> j = 1, . . . , n. The<br />

eigenvalues are thus bounded by 0 < λj < 4. We there<strong>for</strong>e first multiply Au = f by 0.5 into<br />

(0.5A)u = (0.5f)<br />

Note that the solution u does not change. Define B = 0.5A and g = 0.5f, then Bu = g and the<br />

eigenvalues of B lie in (0, 2). For such matrix, we can use the Richardson iteration method.<br />

We develop the following function<br />

template <br />

double richardson( Op const& op, G const& g, U& u, double const& tol, int max it ) ;<br />

where op is a BinaryFunction op(x,y) that computes y = Bx <strong>for</strong> a given input argument x, and<br />

where u is an initial estimate of the solution on input and the computed solution on output.<br />

The vector g is the right-hand side of the system. The return value of richardson is the residual<br />

norm. This allows us to check how accurate the solution is without having to compute the<br />

residual explicitly. The parameter tol corresponds to the tolerance τ.<br />

First, we set conceptual conditions on all arguments.<br />

• U is a model of concept glas::DenseVectorCollection, i.e. we assume that a dense vector<br />

from GLAS is used.<br />

• Op is a model of BinaryFunction, i.e. the following are valid expressions <strong>for</strong> op of type Op:<br />

– op(x,y) where x and y are instances of type X where X is a model of the concept<br />

glas::DenseVectorCollection.


266 CHAPTER 14. NUMERICAL EXERCISES<br />

• G is a model of concept glas::VectorExpression.<br />

Next, we write the code <strong>for</strong> the Richardson iteration. We store the variables ui in u and ri in r.<br />

#ifndef athens richardson hpp<br />

#define athens richardson hpp<br />

#include <br />

namespace athens {<br />

template <br />

double richardson( Op const& op, F const& f, U& u, double const& tol, int max it ) {<br />

double resid norm ;<br />

// Create residual vector<br />

glas::dense vector< typename glas::value type::type > r( glas::size(u) ) ;<br />

<strong>for</strong> ( int iter =0; iter


14.1. COMPUTING AN EIGENFUNCTION OF THE POISSON EQUATION 267<br />

glas::random( f, seed ) ;<br />

v type x( 10 ) ;<br />

x = 0.0 ;<br />

// Richardson iteration<br />

double res nrm = athens::richardson( poisson scaled(), 0.5∗f, x, 1.e−4, 1000 ) ;<br />

{<br />

glas::dense vector r( size(x) ) ;<br />

athens::poisson 1d( x, r ) ;<br />

std::cout ≪ ”res nrm = ” ≪ norm 2( f −r ) ≪ std::endl ;<br />

std::cout ≪ ”f = ” ≪ f ≪ std::endl ;<br />

std::cout ≪ ”x = ” ≪ x ≪ std::endl ;<br />

}<br />

return 0 ;<br />

}<br />

We multiply right-hand side and matrix vector product by 0.5 to make sure the Richardson<br />

method converges.<br />

The output looks like<br />

res_nrm = 0.000195164<br />

f = (10)[0.0484811,0.822283,0.102721,0.436631,0.46112,0.0475317,0.864644,0.0772845,0.920099,0.105434]<br />

x = (10)[1.85463,3.66081,4.64473,5.52601,5.97071,5.9544,5.8906,4.96226,3.95668,2.03105]<br />

Note that the Richardson method converges very slowly. For the Poisson equation, there exist<br />

much faster methods.<br />

14.1.3 LAPACK tridiagonal solver<br />

The LAPACK [?] software package contains routines <strong>for</strong> solving linear systems with a symmetric<br />

positive definite tridiagonal matrix. This package is written in FORTRAN 77. The<br />

corresponding functions are<br />

• Cholesky factorization: A = LL T by<br />

SUBROUTINE DPTTRF( N, D, E, INFO )<br />

• Linear solve: Ax = b using LL T x = b by<br />

SUBROUTINE DPTTRS( N, NRHS, D, E, B, LDB, INFO )<br />

In order to solve Au = f, first A is factorized by the Cholesky factorization into A = LDL T<br />

where L is a matrix consisting of a main diagonal of ones and a diagonal below the main diagonal<br />

and D is a diagonal matrix. Once the factorization is per<strong>for</strong>med, the solution is computed as<br />

u = L −T D(L −1 f). Note the inversions of L and L T are not computed explicitly. For example<br />

L −1 f is computed as a linear solve with L. Linear solves <strong>for</strong> triangular matrices are easy to<br />

program. This is what DPTTRS does <strong>for</strong> us.<br />

A <strong>C++</strong> interface to DPTTRF and DPTTRS is available from the BoostSandbox.Bindings. For<br />

our application, we can solve a linear system as follows.


268 CHAPTER 14. NUMERICAL EXERCISES<br />

1<br />

2<br />

3<br />

4<br />

5<br />

6<br />

7<br />

1. Given an approximate eigenvector x0<br />

2. Normalize: x0 = x0/�x0�2.<br />

3. For i = 1, . . . , m:<br />

3.1. Solve Ayi = xi.<br />

3.2. Compute the eigenvalue estimate: λi = � xi/ � yi.<br />

3.3. xi = yi/�yi�2.<br />

#include // Lapack binding<br />

#include // glas binding<br />

#include // glas vectors<br />

#include // <strong>for</strong> std::fill<br />

#include // <strong>for</strong> assert<br />

#include / <strong>for</strong> cout and endl<br />

int main() {<br />

int const n = 10 ;<br />

glas::dense vector< double > d(n) ; // Main diagonal<br />

glas::dense vector< double > e(n−1) ; // Lower/upper diagonal<br />

std::fill( begin(d), end(d), 2.0 ) ;<br />

std::fill( begin(e), end(e), −1.0 ) ;<br />

glas::dense vector< double > rhs( n ) ;<br />

std::fill( begin(rhs), end(rhs), 3.0 ) ;<br />

int info = boost::numeric::bindings::lapack::pttrf( d, e ) ;<br />

assert( !info ) ;<br />

std::cout≪ rhs ≪ std::endl ;<br />

info = boost::numeric::bindings::lapack::pttrs( ’L’, d, e, rhs ) ;<br />

std::cout≪ rhs ≪ std::endl ;<br />

// Solution is in rhs<br />

}<br />

14.1.4 The inverse iteration method<br />

The inverse iteration method computes an eigenvalue of a matrix A. The method converges to<br />

the eigenvector associated with the eigenvalue nearest zero. The method works as follows: In<br />

this algorithm � xi means the sum of the elements of xi. For the solution of the linear system,<br />

we can use Richardson iteration.<br />

Write a function with the following header:<br />

template <br />

void inverse iteration( Op const& op, DenseVectorCollection& x, int m, Float& lambda ) ;<br />

where Op is a model of BinaryFunction that solves y from x, x is the eigenvector estimate on<br />

input and output, and m the number of iterations. The return value is the estimated eigenvalue.


14.1. COMPUTING AN EIGENFUNCTION OF THE POISSON EQUATION 269<br />

First, we set conceptual conditions on all arguments.<br />

• Op is a model of BinaryFunction, i.e. the following are valid expressions <strong>for</strong> op of type Op:<br />

– op(x,y) where x and y are instances of type X where X is a model of the concept<br />

glas::DenseVectorCollection.<br />

• DenseVectorCollection is a model of glas::DenseVectorCollection.<br />

• Float is a concept of real numbers, i.e. it is float, double, or long double.<br />

The implementation <strong>for</strong> inverse iteration could be as follows:<br />

#ifndef athens inverse iteration hpp<br />

#define athens inverse iteration hpp<br />

#include <br />

#include <br />

namespace athens {<br />

template <br />

void inverse iteration( Op const& op, DenseVectorCollection& x, int m, Float& lambda ) {<br />

glas::dense vector< typename glas::value type::type > y( glas::size(x) ) ;<br />

x = x / norm 2( x ) ; // 2.<br />

<strong>for</strong> ( int i=0; i


270 CHAPTER 14. NUMERICAL EXERCISES<br />

}<br />

} ;<br />

athens::richardson( poisson scaled(), 0.5∗x, y, 1.e−8, 1000 ) ;<br />

int main() {<br />

typedef glas::dense vector v type ;<br />

v type x( 10 ) ;<br />

glas::random seed seed ;<br />

glas::random( x, seed ) ;<br />

double lambda ;<br />

athens::inverse iteration( solve(), x, 100, lambda ) ;<br />

std::cout ≪ ”lambda = ” ≪ lambda ≪ std::endl ;<br />

std::ofstream xf( ”x.out” ) ;<br />

<strong>for</strong> ( int i=0; i


14.1. COMPUTING AN EIGENFUNCTION OF THE POISSON EQUATION 271<br />

0.45<br />

0.4<br />

0.35<br />

0.3<br />

0.25<br />

0.2<br />

0.15<br />

"x.out"<br />

0.1<br />

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1<br />

Figure 14.1: First eigenvector of the 1D Poisson operator<br />

int main() {<br />

typedef glas::dense vector v type ;<br />

int n = 10 ;<br />

v type x( n ) ;<br />

glas::random seed seed ;<br />

glas::random( x, seed ) ;<br />

v type d( n ) ; std::fill( begin(d), end(d), 2.0 ) ;<br />

v type e( n−1 ) ; std::fill( begin(e), end(e), −1.0 ) ;<br />

solve< v type, v type > solver( d, e ) ;<br />

athens::inverse iteration( solver, x, 100, lambda ) ;<br />

std::cout ≪ ”lambda = ” ≪ lambda ≪ std::endl ;<br />

std::ofstream xf( ”x.out” ) ;<br />

<strong>for</strong> ( int i=0; i plot "x.out" w l<br />

gnuplot>


272 CHAPTER 14. NUMERICAL EXERCISES<br />

14.2 The 2D Poisson equation<br />

The 2D Poisson equation is<br />

− ∂2u ∂x2 − ∂2u = f<br />

∂y2 where u(x, y) is the solution and f the excitation and (x, y) ∈ [0, 1] × [0, 1]. We impose the<br />

boundary conditions<br />

u(0, y) = u(1, y) = y(x, 0) = y(x, 1) = 0 .<br />

We discretize the x as xj = jh <strong>for</strong> j = 1, . . . , n and h = 1/n. Similarly, yj = jh. We use finite<br />

differences <strong>for</strong> the second order derivatives. This produces the equation<br />

1<br />

h 2 (−u(xi−1, yj)−u(xi, yj−1)−u (xi+1, yj)−u(xi, yj+1)+4u(xi, yj)) = f(xi, yj) <strong>for</strong> i, j = 1, . . . , n .<br />

This leads to the algebraic system of equations Au = f with n 2 rows and columns.<br />

Recall the example exercise of §14.1. We do exactly the same exercise. Since the matrix is not<br />

tridiagonal, we cannot use the LAPACK routine pttrf any longer. We use the LAPACK routine<br />

sytrf <strong>for</strong> a full matrix instead. See the documentation on<br />

boost-sandbox/libs/numeric/bindings/lapack/doc/index.html.<br />

For a 2D problem the solution vector u can be represented as a matrix. The row index corresponds<br />

to the variable x and the column index to a variable y.<br />

In particular, you develop the functions inverse iteration, poisson 2d <strong>for</strong> the matrix vector product,<br />

scaled poisson <strong>for</strong> the scaled matrix vector product, and richardson. Give <strong>for</strong> each templated<br />

argument the conceptual conditions. make a plot of the eigenvector using Gnuplot’s splot (<strong>for</strong><br />

plotting surfaces).<br />

14.3 The solution of a system of differential equations<br />

In this exercise, we write a function <strong>for</strong> the computation of a time step of a system of differential<br />

equations using Runge-Kutta methods.<br />

14.3.1 Explicit time integration<br />

Methods <strong>for</strong> the solution of the differential equation<br />

˙u = f(u) u(0) = u0<br />

operate time step by time step, i.e. the time is discretized and given the solution at time step<br />

tj, we compute the solution at time step tj+1 = tj + h where h is small.


14.3. THE SOLUTION OF A SYSTEM OF DIFFERENTIAL EQUATIONS 273<br />

The method that we use here is the Runge-Kutta 4 method, which is described here: the<br />

solution at time step tj+1 is computed as<br />

14.3.2 Software<br />

Write a generic function<br />

uj+1 = uj + h<br />

6 (k1 + 2k2 + 2k3 + k4)<br />

k1<br />

k2<br />

=<br />

=<br />

f(uj)<br />

�<br />

f uj + h<br />

2 k1<br />

k3 =<br />

�<br />

�<br />

f uj + h<br />

2 k2<br />

�<br />

k4 = f(uj + hk3)<br />

template <br />

void rk4( U& u, F& f, T const& h ) ;<br />

that computes one time step with the Runge-Kutta 4 method. The argument u is on input the<br />

solution at time t and on output at timestep t+h. The argument f is the functor that evaluates<br />

the function f(u). The argument u is a vector.<br />

When the implementation is finished, write the concepts <strong>for</strong> U and F in comment lines in the<br />

code.<br />

14.3.3 The van der Pol oscillator<br />

Differential equations appear in the study of physical phenomena. The Van der Pol oscillator<br />

is described by the following equation:<br />

d2x dt2 − µ(1 − x2 ) dx<br />

+ x = 0 (14.3)<br />

dt<br />

with initial solution x(0) and x ′ (0). This is a non-linear second order differential equation, with<br />

a parameter µ. When µ = 0, we have a purely harmonic solution (cos and sin). When µ ≥ 0,<br />

the solution evolves to a harmonic limit cycle.<br />

Second order differential equations are usually solved by writing them as a system of first order<br />

differential equations:<br />

� � � � � �<br />

d dx<br />

dt −µ(1 − x2 dx<br />

) 1<br />

+<br />

dt = 0 .<br />

dt x<br />

−1 0 x<br />

In matrix <strong>for</strong>m, the equation can be written as<br />

where<br />

A(u) =<br />

du<br />

dt<br />

+ A(u)u = 0<br />

� −µ(1 − u 2 2 ) 1<br />

−1 0<br />

�<br />

,


274 CHAPTER 14. NUMERICAL EXERCISES<br />

1 3<br />

2<br />

Figure 14.2: An example of a web with only four pages. An arrow from page A to page B<br />

indicates a link from page A to page B.<br />

or<br />

with<br />

14.3.4 Exercise<br />

du<br />

dt<br />

= f(u)<br />

f(u) = −A(u)u .<br />

Use the Runge-Kutta 4 method <strong>for</strong> evaluating the Van der Pol equation <strong>for</strong> µ = 0, µ = 0.1 and<br />

µ = 1 in the time interval [0, 10] with time steps h = 0.001. Also try smaller and larger time<br />

steps.<br />

Plot the results using gnuplot.<br />

14.4 Google’s Page rank<br />

We all use Google <strong>for</strong> web searching. In this exercise, we try and understand a particular tool<br />

used by Google to rank pages, called PageRank.<br />

The basic idea behind the Google Page Ranking Algorithm, is that the importance of a webpage<br />

is determined by the number of references made to it. We would like to compute a score xk<br />

reflecting the importance of page k. A simple minded approach would be just to count the<br />

number of links to each page. This approach does not reflect the fact that some pages might<br />

be more significant than others there<strong>for</strong>e rendering their votes more important. It also leaves<br />

open the possibility of artificially inflating the rank of a particular page by generating other<br />

trivial or advertising pages whose only function is to promote the importance of a particular<br />

page. Significant refinements are:<br />

• Weight each in-link by the importance of the page which links to it.<br />

• Give each page a total vote of 1. If page j contains nj links, one of which links to page k,<br />

then page k’s score is boosted by xj<br />

nj .<br />

4


14.4. GOOGLE’S PAGE RANK 275<br />

Taking the new refinements into account we can compute the importance score xk of a page k<br />

as follows:<br />

xk = � xj<br />

(14.4)<br />

nj<br />

j∈Lk<br />

Where Lk denotes the set of pages with a link to page k. Consider the simple example of Figure<br />

14.2. Using the <strong>for</strong>mula (14.4) we get the following equations <strong>for</strong> the importance scores of the<br />

pages in this example:<br />

x1 = x3 + x4<br />

2<br />

x2 = x1<br />

3<br />

x3 = x1<br />

3<br />

x4 = x1<br />

3<br />

+ x2<br />

2<br />

+ x2<br />

2<br />

+ x4<br />

2<br />

These linear equations can be written as Ax = x where x = [x1 x2 x3 x4] T and<br />

⎡<br />

⎤<br />

A =<br />

⎢<br />

⎣<br />

0 0 1 1<br />

2<br />

1<br />

3 0 0 0<br />

1<br />

3<br />

1<br />

3<br />

1 1<br />

2 0 2<br />

1<br />

2 0 0<br />

This trans<strong>for</strong>ms the web ranking problem into the standard problem of finding an eigenvector<br />

x with eigenvalue 1 <strong>for</strong> the square matrix A. This eigenvector can be found iteratively using<br />

the power method with a threshold τ:<br />

The power method converges to the eigenvector corresponding to the dominant eigenvalue λ1.<br />

1<br />

2<br />

3<br />

4<br />

5<br />

6<br />

1. v (0) = some vector with � v (0) �= 1<br />

2. Repeat <strong>for</strong> k=1,2, . . . :<br />

2.1. Apply A: w = Av (k−1) .<br />

2.2. Normalize: v (k) = w/ � w �.<br />

3. Until �v (k−1) − v (k) � < τ<br />

The matrix A is called a column stochastic matrix, since it is a square matrix with positive<br />

entries and the entries in each column sum to one. In the case of a column stochastic matrix,<br />

this dominant eigenvalue is 1.<br />

14.4.1 Software<br />

Write a generic function:<br />

template <br />

void power iteration( V& v, Function & f, double tau ) ;<br />

that computes the power iteration algorithm 5 <strong>for</strong> a matrix A with starting vector v. The<br />

resulting eigenvector should be stored in v. The argument f is a functor that returns the result<br />

of the matrix vector product. Also write documentation and specify the conceptual constraints<br />

<strong>for</strong> the arguments.<br />

⎥<br />

⎦ .


276 CHAPTER 14. NUMERICAL EXERCISES<br />

14.4.2 Dictionary application<br />

The page ranking algorithm which was described above can also be used to rank different words<br />

in a dictionary. Consider the following small dictionary:<br />

backwoods = bush, jungle<br />

bush = backwoods, jungle, shrub, plant, hedge<br />

flower = plant<br />

hedge = bush<br />

jungle = bush, backwoods<br />

plant = bush, shrub, flower, weed<br />

shrub = bush, plant, tree<br />

tree = shrub<br />

weed = plant<br />

Construct a graph linking every word with the words in its explanation. The first line of the<br />

dictionary, <strong>for</strong> example, would link bush and jungle to backwoods. The graph can be constructed<br />

on paper. Use equation (14.4) to construct the sparse column stochastic matrix A and use your<br />

power method to rank the words.<br />

o of a function in an interval<br />

In this exercise, we make a programming exercise on a root finding method, called the bisection<br />

method.<br />

14.5.1 Functions in one variable<br />

Suppose we are given a function f in one variable and we want to compute the unique zero<br />

in the interval [a, b]. A method that could be used is the bisection method. It only requires<br />

function evaluations and is thus widely applicable.<br />

The method computes a small interval that contains the minimum. This small interval is<br />

obtained by splitting the interval [a, b] in two parts [a, m] and [m, b], where<br />

The method works as follows :<br />

1<br />

2<br />

3<br />

4<br />

5<br />

6<br />

m =<br />

a + b<br />

2<br />

1. Given the interval [a, b] <strong>for</strong> which f(a)f(b) < 0.<br />

2. Repeat until b − a < τ:<br />

2.1. Compute m from (14.5).<br />

2.2. If f(m)f(a) < 0: b = m.<br />

2.2. Else: a = m.<br />

(14.5)


14.5. THE BISECTION METHOD FOR FINDING THE ZERO OF A FUNCTION IN AN INTERVAL277<br />

14.5.2 Software<br />

The task is to first develop the function<br />

template <br />

void bisection( T& a, T& b, Function& f, double tau ) ;<br />

that computes the bisection Algorithm 6. The object f is a functor that returns the function<br />

value <strong>for</strong> a single argument x. The type T is a float type, i.e. float or double.<br />

Write documentation <strong>for</strong> the function and describe the conceptual conditions on Function.<br />

14.5.3 The growth and downfall of a caterpillar population<br />

Everyone knows caterpillars grow up to be beautiful butterflies. But be<strong>for</strong>e they reach that<br />

stage of their life, they need lots of food to grow. A large population will not grow at the same<br />

rate as a smaller one, because of a shortage of food. Furthermore most birds enjoy a juicy<br />

caterpillar as snack, so they are responsible <strong>for</strong> the premature death of several members of the<br />

caterpillar population. These relationships can be modelled mathematically by the following<br />

equation:<br />

dN<br />

dt<br />

N αN 2<br />

= rN(1 − ) −<br />

K β + N 2<br />

In this equation rN(1 − N<br />

) models the growth of the population, where N equals the num-<br />

K<br />

ber of caterpillars, r is the growing rate of the population and K is the maximum amount of<br />

caterpillars that can inhabit the area. The second term of the equation models the death of the<br />

caterpillars. Here α is the maximum rate at which a bird can eat caterpillars when N is large<br />

and β is a parameter that indicates the intensity of the bird attacks. We want to know when<br />

there exists an equilibrium between the growth and death rate in the caterpillar population,<br />

i.e. when dN<br />

equals zero.<br />

dt<br />

Use the function bisection to compute the number of caterpillars N <strong>for</strong> which the following<br />

populations are at an equilibrium in the intervals [0.1; 10], [10, 20] and [20, 100]:<br />

• Population 1: r = 1.3, K = 100, α = 20 and β = 50<br />

• Population 2: r = 2.0, K = 80, α = 25 and β = 10<br />

Show the resulting roots in a table.<br />

14.5.4 Computing eigenvalues using the bisection method<br />

In this exercise, we use the function bisection to compute the eigenvalues of a real symmetric<br />

dense matrix A with real eigenvalues. The problem is to compute λ so that<br />

det(A − λI) = 0 .<br />

The determinant is computed using the QR factorization (which is available in LAPACK). The<br />

QR factorization computes an orthogonal matrix Q (Q T Q = I) and an upper triangular matrix<br />

R so that<br />

A − λI = QR .


278 CHAPTER 14. NUMERICAL EXERCISES<br />

We use the property that<br />

det(A − λI) = det(R)<br />

Since R is upper triangular, the determinant is the product of the diagonal elements of R.<br />

The matrices A are constructed as follows. Start with a simple case : the diagonal matrix with<br />

elements 1, 2, . . . , n on the main diagonal. Then do the tests <strong>for</strong> the same matrix multiplied<br />

on left and right by a random orthogonal matrix X, as in A = XDX T where D is a diagonal<br />

matrix.<br />

g the minimum of a convex function<br />

This exercise is a programming exercise on Newton’s method. First, we explain the method <strong>for</strong><br />

a function with a single variable, then we discuss the case of multivariate functions, and finally,<br />

we show a small application.<br />

14.6.1 Functions in one variable<br />

For a differentiable function f, the minimum ˜x is attained <strong>for</strong> f ′ (˜x) = 0. So, we must find the<br />

zero of the first order derivative. When f is a second order polynomial, we have<br />

then an extreme value of f is attained <strong>for</strong><br />

which is a minimum when f ′′ ≡ 2γ > 0.<br />

f = p := α + βx + γx 2<br />

f ′ = p ′ := β + 2γx<br />

˜x = − β<br />

2γ<br />

(14.6)<br />

For an arbitrary function, we do not have such simple explicit <strong>for</strong>mulae. We can use an iterative<br />

method, which is called Newton’s method. It is an iterative approach, i.e. we start from an<br />

initial guess ˜x and improve this value until it has converged to the minimum of the function. On<br />

each iteration we approximate the function by a degree two polynomial, <strong>for</strong> which the simple<br />

<strong>for</strong>mula (14.6) can be used. One way to compute such a degree two polynomial is to start from<br />

the Taylor expansion of f around ˜x :<br />

f(x) = f(˜x) + f ′ (˜x)(x − ˜x) + 1<br />

2 f ′′ (˜x)(x − ˜x) 2 + · · · .<br />

If we approximate f by the first 3 terms (i.e. a degree two polynomial), then we have<br />

f(x) ≈ p(x) := f(˜x) + f ′ (˜x)(x − ˜x) + 1<br />

2 f ′′ (˜x)(x − ˜x) 2 .<br />

If x is close to ˜x, |f(x) − p(x)| is small. The first order derivative is<br />

f ′ (x) ≈ p ′ (x) = f ′ (˜x) + f ′′ (˜x)(x − ˜x) .


14.6. THE NEWTON-RAPHSON METHOD FOR FINDING THE MINIMUM OF A CONVEX FUNCTION2<br />

Then p ′ (x) = 0 <strong>for</strong><br />

x = ˜x − f ′ (˜x)<br />

f ′′ (˜x)<br />

The Newton method goes as follows : In this algorithm τ is a tolerance <strong>for</strong> the stopping criterion.<br />

1<br />

2<br />

1. Given initial ˜x = x (0) .<br />

2. Repeat <strong>for</strong> j = 1, 2, . . .<br />

3<br />

2.1. Compute x (j) = x (j−1) − f ′ (x (j−1) )/f ′′ (x (j−1) 4<br />

)<br />

3. Until |f ′ (x (j−1) )/f ′′ (xj−1) 5<br />

)| < τ<br />

The iteration stops when the derivative is much smaller than the second order derivative. What<br />

happens when f ′′ (xj−1) = 0 ?<br />

14.6.2 Multivariate functions<br />

For multivariate functions, the principle is the same, but it is more complicated. A multivariate<br />

function f has an argument x ∈ R n , ie. a vector of size n. For example, f = sin(x1)+x2 cos(x1)<br />

is a multivariate function in the variables x1 and x2.<br />

We use the same idea as <strong>for</strong> one variable. That is, we use the Taylor expansion to approximate<br />

the function :<br />

f(x) � f(˜x) + ∇f(˜x) T (x − ˜x) + 1<br />

2 (x − ˜x)T H(f(˜x))(x − ˜x)<br />

⎛ ⎞<br />

∂f/∂x1<br />

⎜ ⎟<br />

∇f(˜x) = ⎝ . ⎠<br />

H(f) =<br />

⎡<br />

⎢<br />

⎣<br />

∂f/∂xn<br />

∂ 2 f/∂x1∂x1 · · · ∂ 2 f/∂x1∂xn<br />

.<br />

∂ 2 f/∂xn∂x1 · · · ∂ 2 f/∂xn∂xn<br />

where ∇f(˜x) is called the gradient vector and H(f) the Hessian matrix.<br />

The derivative becomes<br />

so, the derivative is zero when<br />

f ′ (x) = ∇f(˜x) + H(f(˜x))(x − ˜x)<br />

x = ˜x − {H(f(˜x))} −1 ∇f(˜x) .<br />

This requires the solution of an n × n linear system on each iteration. The Newton algorithm<br />

is very similar to the univariate case:<br />

14.6.3 Software <strong>for</strong> uni-variate functions<br />

The task is to first develop the function<br />

.<br />

⎤<br />

⎥<br />


280 CHAPTER 14. NUMERICAL EXERCISES<br />

1<br />

2<br />

3<br />

4<br />

5<br />

6<br />

1. Given initial ˜x = x (0) ∈ R n .<br />

2. Repeat <strong>for</strong> j = 1, 2, . . .<br />

2.1. Compute d = {H(f(x (j−1) )} −1∇f(x (j−1 ))<br />

2.2. Compute x (j) = x (j−1) − d.<br />

3. Until �d�2 < τ<br />

template <br />

void newton raphson( X& x, Function& f, Derivative& d, SecondDerivative& s, double tau ) ;<br />

that computes the Newton-Raphson Algorithm 7. f, d, and s are functors that return the<br />

function value and the derivatives <strong>for</strong> the single argument x.<br />

Write documentation <strong>for</strong> the function and describe the conceptual conditions on Function,<br />

Derivative, and SecondDerivative.<br />

Next, you use this function to compute the minima <strong>for</strong> the following functions :<br />

• f = x 2 − 2x + 4<br />

• f = x 10<br />

• f = x + 5<br />

• f = −x 2 − 2x + 4<br />

14.6.4 Software <strong>for</strong> multi-variate functions<br />

The task is to first develop the function<br />

template <br />

void newton raphson( X& x, Function& f, Gradient& d, Hessian& h, double tau ) ;<br />

that computes the Newton-Raphson Algorithm 8. Note that in this case, g and h should return<br />

the resulting gradient vector and Hessian matrix respectively. Also write documentation and<br />

specify the conceptual constraints <strong>for</strong> the arguments.<br />

14.6.5 Application<br />

The following is an application <strong>for</strong> the multivariate case. Given a Hermitian matrix L ∈ R n×n ,<br />

then we want to solve the following optimization problem :<br />

min 1<br />

2 xT Lx<br />

s.t. x T x = 1<br />

We first introduce a Lagrange multiplier λ and rewrite this problem in the following <strong>for</strong>m. Find<br />

x and λ so that<br />

min f(x, λ) = 1<br />

2 xT Lx − 1<br />

2 λ(x∗x − 1)


14.7. SEQUENTIAL NOISE REDUCTION OF REAL-TIME MEASUREMENTS BY LEAST SQUARES281<br />

The gradient and Hessian are :<br />

∇f =<br />

H(f) =<br />

�<br />

Lx − λx<br />

xT �<br />

x − 1<br />

�<br />

L − λI −x<br />

−xT �<br />

0<br />

One can prove that the solution of this optimization problem is the smallest eigenvalue λ<br />

and associated normalized eigenvector x. This is a method <strong>for</strong> computing eigenvalues of large<br />

matrices.<br />

For solving a linear system with the Hessian, you can use the direct solver MUMPS or the<br />

iterative solver toolbox from GLAS.<br />

easurements by least squares<br />

Suppose, we want to measure a function f(t) <strong>for</strong> given time snapshots t1, . . . , tm. We know that<br />

the function is a polynomial of a given degree n − 1, but due to measurement errors, the data<br />

are noisy. If f is a polynomial,<br />

n�<br />

f(t) = ξjt j−1<br />

We could have a more general series, e.g.<br />

f(t) =<br />

where φj is the jth base function. With<br />

⎛ ⎞<br />

f(t1)<br />

⎜ ⎟<br />

b = ⎝ . ⎠<br />

f(tm)<br />

⎛ ⎞<br />

we have<br />

x =<br />

A =<br />

⎜<br />

⎝<br />

⎡<br />

⎢<br />

⎣<br />

ξ1<br />

.<br />

ξn<br />

⎟<br />

⎠<br />

j=1<br />

n�<br />

ξjφj(t)<br />

j=1<br />

φ1(t1) φ2(t1) · · · φn(t1)<br />

.<br />

.<br />

φ1(tm) φ2(tm) · · · φn(tm)<br />

⎤<br />

⎥<br />

⎦<br />

Ax = b (14.7)<br />

Note that (14.7) is an m×n linear system, where usually m ≫ n. This system is overdetermined,<br />

and so, due to errors in the data, it cannot be solved. However, we can solve the system in a<br />

least squares sense, i.e. find x so that<br />

min<br />

x �Ax − b�2 . (14.8)


282 CHAPTER 14. NUMERICAL EXERCISES<br />

When measurements come in sequentially, i.e. at time steps t1, t2, . . ., we receive at time step<br />

tj the jth row of A and the jth element of b. In the algorithms we now discuss, we have the<br />

following<br />

14.7.1 The least squares QR algorithm<br />

A numerically stable method <strong>for</strong> solving (14.8) is based on the QR factorization. The QR<br />

factorization of the m × n matrix A is<br />

A = QR<br />

with Q ∈ R m×n and unitary (Q T Q = I) and R ∈ R n×n upper triangular. If A has full rank,<br />

the diagonal elements of R are non-zero. Suppose we have computed the solution <strong>for</strong><br />

where<br />

�Akx − bk�2 min<br />

�Akx − bk�2 = �QkRkx − bk�2<br />

= �Rkx − Q T k bk�2<br />

We have to solve an upper triangular linear system. We can develop a ‘sequential’ method <strong>for</strong><br />

this QR decomposition without storing Q, but we will not discuss this any further.<br />

14.7.2 The least squares method via the normal equations<br />

One method to achieve this are the normal equations. That is, multiply (14.7) on the left by<br />

A T , then we obtain<br />

A T Ax = A T b (14.9)<br />

If the A has full column rank, the solution of x is unique and satisfies (14.8).<br />

14.7.3 Least squares Kalman filtering<br />

The Kalman filter is a method to solve the normal equations (14.9) in a step by step way, i.e.<br />

the measurements come in time step at time step. The Kalman filter adapts the least squares<br />

solution to the newly arrived data.<br />

Suppose we have computed the least squares solution of<br />

Akxk = bk<br />

where Ak are the first k rows of A and bk the first k elements of b with k ≥ n. Then we want<br />

to compute the least squares solution of<br />

Since<br />

Ak+1 =<br />

� Ak<br />

a T k+1<br />

Ak+1xk+1 = bk+1 .<br />

�<br />

�<br />

and bk+1 =<br />

bk<br />

f(tk+1)<br />


14.7. SEQUENTIAL NOISE REDUCTION OF REAL-TIME MEASUREMENTS BY LEAST SQUARES283<br />

we have, with gk = A T k bk, that<br />

A T k+1 Ak+1xk+1 = gk+1<br />

(A T k Ak + ak+1a T k+1 )xk+1 = gk + ak+1f(tk+1)<br />

With Mk = (A T k Ak) −1 ∈ R n×n , we derive from the Sherman-Morrison <strong>for</strong>mula, that<br />

and we also have that<br />

Mk+1 := (A T k Ak + ak+1a T k+1 )−1 = Mk − Mkak+1a T k+1 Mk<br />

1 + a T k+1 Mkak+1<br />

xk+1 = Mk+1(gk + ak+1f(tk+1)) = Mkgk + Mkak+1f(tk+1) − Mkak+1aT k+1<br />

1 + aT k+1Mkak+1 (Mkgk + Mkak+1f(tk+1))<br />

= xk +<br />

Mkak+1<br />

1 + aT k+1Mkak+1 (f(tk+1) − a T k+1xk) The Kalman method works as follows. We can use the LAPACK subroutine DGESV <strong>for</strong> com-<br />

1<br />

2<br />

3<br />

4<br />

5<br />

6<br />

7<br />

1. Solve Anxn = bn by taking the first n rows of A and b<br />

2. Let Mn = A−1 n A−T n<br />

3. For k = n + 1, . . . , m do:<br />

3.1. Compute the Kalman gain vector kk+1 = Mkak+1/(1 + aT k+1Mkak+1). 3.2. Update step: xk+1 = xk + kk+1(f(tk+1) − aT k+1xk). 3.3. Mk+1 = Mk − kkaT k+1Mk. puting A −1<br />

n .<br />

14.7.4 Software<br />

The goal is to write a function that computes the Kalman filter least squares. Because of the<br />

sequential character, we suggest to make a class with the following specifications:<br />

template <br />

class kalman {<br />

public:<br />

// Creation of the Kalman filter<br />

kalman( int n ) ;<br />

// Compute the first n observations and initialize the Kalman<br />

// filter (Steps 1 and 2 in the algorithm)<br />

// BaseFun is a binary functor.<br />

template <br />

void initialize( VIt t begin, VIt const& t end, BaseFun& base fun, F& f ) {<br />

...<br />

}<br />

template


284 CHAPTER 14. NUMERICAL EXERCISES<br />

void step( T const& t, Base& base, F const& f ) {<br />

...<br />

}<br />

public:<br />

// Return the solution<br />

typedef ... x type ;<br />

x type const& x() const { ... }<br />

private:<br />

...<br />

14.7.5 Test problems<br />

We now solve the following test problems. First, consider the following expansion:<br />

f(t) = ξ1 + ξ2 cos t + ξ3 sin t + ξ4cos2t + ξ5 sin 2t<br />

We compute the coefficients following the least squares criterion <strong>for</strong> the function<br />

f = (2 − 5 cos t)<br />

Print the solution x <strong>for</strong> each step of the Kalman filter and see how it changes. It should be very<br />

close to the function.<br />

Then apply random noise with relative size 0.0001:<br />

f = (2 − 5 cos t)(1 + 0.0001ɛ)<br />

where ɛ is a random number in [−1, 1]. Print the solution x <strong>for</strong> each step of the Kalman filter<br />

and see how it changes. It should be close to the solution of the function with ɛ ≡ 0.<br />

Plot the results using gnuplot.


Programmierprojekte<br />

Kapitel 15<br />

Die folgenden Hinweise betreffen alle Projekte.<br />

• Die Projekte werden vorzugsweise in Teams von 2 Studenten realisiert.<br />

• Jedes Team bekommt ein Repository in einem MTL4-Zweig zur Verfügung.<br />

• Das heißt auch, dass jeder Kursteilnehmer die Versionskontrollesoftware “subversion” lernen<br />

muss, siehe http://subversion.tigris.org/. Die Vorlesung von Greg Wilson<br />

gibt eine ausreichende Einführung in subversion, siehe http://software-carpentry.<br />

org/. Ich werde in der 2. Übung (19.4.) selbst eine kurze Einführung geben.<br />

• Die Projekte sollen mit einem Kommando gebildet (kompiliert, gelinkt) werden. Verwenden<br />

Sie möglichst “cmake”. 1 cmake ist bei jedem vernünftigen Linux mit dabei und<br />

müsste auch auf dem Pool vorhanden sein. Gibt es sogar für Windows: dort kann es die<br />

Projektdateien für Visual Studio erzeugen.<br />

• Schreiben Sie zuerst Tests für neue Features, bevor Sie sie implementieren.<br />

• Versuchen Sie, Ihre Rückfragen auf die Übungszeiten zu begrenzen.<br />

• Schreiben Sie eine doxygen-Dokumentation für Ihre Klassen und Funktionen (auf englisch).<br />

Schreiben Sie möglichst viele Beispiele. (Diese können gern von Ihren Tests abgeleitet<br />

sein.)<br />

– Formeln möglichst mit den Kommandos für L ATEX-Einfügungen (\f[ u.ä) erstellen.<br />

Bei dieser Gelegenheit lernt man häufig auch seine Linux-Installation besser kennen,<br />

da doxygen L ATEX nicht immer findet. Es ist hier keine Schande, Hilfe von befreundeten<br />

Hackern in Anspruch zu nehmen.<br />

15.1 Potenzieren von Matrizen A x<br />

Implementieren Sie Algorithmen für A x für verschiedene Matrixtypen und für x ∈ Q als x ∈ R.<br />

1 Notfalls reines “make” (siehe z.B. http://software-carpentry.org/build.html).<br />

285


286 KAPITEL 15. PROGRAMMIERPROJEKTE<br />

15.2 Exponisation von Matrizen e A<br />

Implementieren Sie Algorithmen für e A für verschiedene Matrixtypen, insbesondere schwach<br />

besetzte Matrizen. Nutzen Sie die in der MTL4 vorhandenen Algorithmen zum Lösen von<br />

Gleichungssystemen. Artikel von Cleve Moller, “19 dubios ways. . . ”<br />

15.3 LU-Zerlegung für m × n-Matrizen<br />

m, n L U<br />

m = n unteres Dreieck oberes Dreieck<br />

m > n Trapez oberes Dreieck<br />

m < n unteres Dreieck Trapez<br />

A = P · L · U (15.1)<br />

Bei L wird die Diagonale=1, daher nicht mit gespeichert. Berechung der Lösung eines Systems<br />

von Gleichungen und anschließende Fehlerberechung.<br />

Siehe http://software.intel.com/sites/products/documentation/hpc/mkl/mklman/lle/functn_getrf.htm.<br />

15.4 Bunch-Kaufman Zerlegung<br />

A mit A = A T<br />

Implementiere die Zerlegung:<br />

• Überrschreibend,<br />

A = P · U · D · U T · P T<br />

(15.2)<br />

• und entwickle Funktionen zum Extrahieren von P , U und D aus dem resultierenden A.<br />

• Kopiere A, berechne die Zerlegung und gib P , U und D als Tuple zurück.<br />

Siehe http://software.intel.com/sites/products/documentation/hpc/mkl/mklman/lle/functn_sytrf.htm.<br />

15.5 Konditionszahl (reziprok)<br />

• Im allgemeinen Fall LU nutzen.<br />

– Cholesky, wenn symmetrisch.<br />

∗ Gegebenenfalls Bunch-Kaufmann . . .


15.6. MATRIX-SKALIERUNG 287<br />

Siehe http://software.intel.com/sites/products/documentation/hpc/mkl/mklman/lle/functn_gecon.htm.<br />

15.6 Matrix-Skalierung<br />

Für dicht und schach besetzte Matrizen Zeilen- und Spalten-Skalierungsfaktoren. Damit größter<br />

Matrixeintrag in jeder Zeile und Spalte 1 ist.<br />

Siehe http://software.intel.com/sites/products/documentation/hpc/mkl/mklman/lle/functn_geequ.htm.<br />

15.7 QR mit Überschreiben<br />

Implementieren Sie verschiedene eine Zerlegung<br />

mit Q orthogal/unitär für reelle/komplexe A. Realisieren Sie:<br />

• Eine überschreibende Faktorisierung wie in LAPACK,<br />

• Funktionen zum Extrahieren von Q und R,<br />

• Eine Version, die A kopiert und Q und R als Paar zurückgibt.<br />

• Schreiben Sie Tests oder Anwendungen.<br />

A = QR (15.3)<br />

Siehe http://software.intel.com/sites/products/documentation/hpc/mkl/mklman/lse/functn_orgqr.htm,<br />

http://software.intel.com/sites/products/documentation/hpc/mkl/mklman/lse/functn_ungqr.htm,<br />

http://software.intel.com/sites/products/documentation/hpc/mkl/mklman/lse/functn_ormqr.htm,<br />

http://software.intel.com/sites/products/documentation/hpc/mkl/mklman/lse/functn_unmqr.htm.<br />

15.8 Direkter Löser für schwach besetzte Matrizen<br />

Implementieren Sie einen direkten Löser rekursiv.<br />

• Die Matrix sollte hierarchisch als Quad-Baum dargestellt werden.<br />

• Die Operationen sollen auch rekursiv auf Blöcken durchgeführt werden:<br />

– Matrixaddition und -subtrakion,<br />

– Matrixmultiplikation,<br />

– Inverse von Teilbäumen<br />

– Pivotisierung auf<br />

∗ Spalte,


288 KAPITEL 15. PROGRAMMIERPROJEKTE<br />

∗ Zeile oder<br />

∗ Diagonale.<br />

Je nachdem, was am besten geeignet erscheint.<br />

– Die Pivotisierung muss natürlich durch eine Permutation repräsentiert werden.<br />

• Die Anwendung der Lösung auf einen Vektor möglichst auch rekursiv anwenden.<br />

– Das bedeutet, den Dreieckslöser auch rekursiv zu implementieren.<br />

Abbildung 15.1: Hierarchischer Ansatz.<br />

Dieses Projekt ist die größte Heraus<strong>for</strong>derung von allen und auch signifikante Teilergebnisse<br />

werden als Erfolg gewertet.<br />

15.9 Anwendung MTL4 auf Typen der Intervallarithmetik<br />

Schreiben Sie Anwendungen von Matrizen und Vektoren für geeignete Typen der Intervallarithmetik,<br />

z.B. boost::interval.


15.10. ANWENDUNG MTL4 AUF TYPEN MIT HÖHERER GENAUIGKEIT 289<br />

15.10 Anwendung MTL4 auf Typen mit höherer Genauigkeit<br />

Schreiben Sie Anwendungen von Matrizen und Vektoren für geeignete Typen mit höherer<br />

Genauigkeit, z.B. Gnu Multiprecision (GMP).<br />

15.11 Anwendung MTL4 auf AD-Typen<br />

Schreiben Sie Anwendungen von Matrizen und Vektoren für geeignete Typen des automatischen<br />

Differenzierens mit operatorüberladener Ableitung.


290 KAPITEL 15. PROGRAMMIERPROJEKTE


Acknowledgement<br />

Chapter 16<br />

Special thanks to Josef Weinbub, Carlos Giani, and Franz Stimpfl. These people are<br />

instrumental in the design and development of GSSE and this book. Thanks also goes to<br />

Michael Spevak <strong>for</strong> the development of some basic concepts and text parts <strong>for</strong> an early<br />

version of GSSE.<br />

Andrey Chesnokov, Yvette Vanberghen, Kris Demarsin and Yao Yue.<br />

students of the class “C ++ für Wissenschaftler” at <strong>Technische</strong> <strong>Universität</strong> <strong>Dresden</strong> <strong>for</strong> many<br />

fruitful discussion<br />

291


292 CHAPTER 16. ACKNOWLEDGEMENT


Bibliography<br />

[AG04] David Abrahams and Aleksey Gurtovoy. <strong>C++</strong> Template Metaprogramming: Concepts,<br />

Tools, and Techniques from Boost and Beyond. Addison-Wesley, 2004.<br />

[CE00] Krzysztof Czarnecki and Ulrich W. Eisenecker. Generative programming: methods,<br />

tools, and applications. ACM Press/Addison-Wesley Publishing Co., New York, NY,<br />

USA, 2000.<br />

[DHP03] Ionut Danaila, Frédéric Hecht, and Olivier Pironneau. Simulation Numérique en<br />

<strong>C++</strong>. Dunod, Paris, 2003.<br />

[ES90] Margaret A. Ellis and Bjarne Stroustrup. The Annotated <strong>C++</strong> Reference Manual.<br />

Addison-Wesley, 1990.<br />

[GA04] Aleksey Gurtovoy and David Abrahams. Boost Meta-Programming Library (MPL).<br />

Boost, 2004. www.boost.org/doc/libs/1_42_0/libs/mpl/doc/index.html.<br />

[Got11] Peter Gottschling. Mixed Complex Arithmetic. SimuNova, 2011.<br />

=https://simunova.zih.tu-dresden.de/mtl4/docs/mixed complex.html, Part of<br />

Matrix Template Library 4.<br />

[Kar05] Björn Karlsson. Beyond the <strong>C++</strong> standard library. Addison-Wesley, 2005.<br />

[SA05] Herb Sutter and Andrei Alexandrescu. <strong>C++</strong> coding standards. The <strong>C++</strong> in-depth<br />

series. Addison-Wesley, 2005.<br />

[Sch] Douglas C. Schmidt. <strong>C++</strong> programming language tutorials. http://www.cs.wustl.<br />

edu/~schmidt/<strong>C++</strong>.<br />

[Str97] Bjarne Stroustrup. The <strong>C++</strong> Programming Language. Addison-Wesley, 3rd edition,<br />

1997.<br />

293

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!