03.12.2012 Views

C++ for Scientists - Technische Universität Dresden

C++ for Scientists - Technische Universität Dresden

C++ for Scientists - Technische Universität Dresden

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

5.4. META-TUNING: WRITE YOUR OWN COMPILER OPTIMIZATION 165<br />

{<br />

};<br />

template <br />

void operator()(const Matrix& A, const VecIn& v in, ScalOut& tmp)<br />

{<br />

fsize mat vec mult aux()(A, v in, tmp);<br />

tmp+= A[Rows][Cols] ∗ v in[Cols];<br />

}<br />

To terminate the computation in the column we write a specialization.<br />

template <br />

struct fsize mat vec mult aux<br />

{<br />

template <br />

void operator()(const Matrix& A, const VecIn& v in, ScalOut& tmp)<br />

{<br />

tmp= A[Rows][0] ∗ v in[0];<br />

}<br />

};<br />

In this section we showed different ways to optimize a two-dimensional loop (with fixed sizes).<br />

There are certainely more possibilities: <strong>for</strong> instance, we could try to implement it in a way that<br />

uses registers but with the same concurrency as in the second-last implementation. Another<br />

<strong>for</strong>m of optimization could be to agglomerate the write-backs so that multiple invalidation<br />

signals are sent at a time and maybe behave less interruptive.<br />

5.4.3 Dynamic Unrolling – Warm up<br />

⇒ vector unroll example.cpp<br />

As important as the fixed-size optimization is, acceleration <strong>for</strong> dynamically sized containers is<br />

needed even more. We start here with a simple example and some observations. We will reuse<br />

the vector class from Listing 4.1. To show the implementation more clearly, we write the code<br />

without operators and expression templates. Our test case will compute<br />

u = 3v + w<br />

<strong>for</strong> three short vectors of size 1000. The wall clock time will be measured with boost::timer. 19<br />

The vectors v and w will be initialized and to have the data ready to use (i.e. the vectors are<br />

definitively in cache 20 ) we run few additional operations without timing:<br />

#include <br />

#include <br />

// ...<br />

int main()<br />

{<br />

unsigned s= 1000;<br />

if (argc > 1) s= atoi(argv[1]); // read (potentially) from command line<br />

19 See http://www.boost.org/doc/libs/1_43_0/libs/timer/timer.htm<br />

20 TODO: shouldn’t the initialization make this sure? Do we have a better explanation? Reference to benchmark<br />

literature? Do we really need a bullet proof justification here?

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!