03.12.2012 Views

C++ for Scientists - Technische Universität Dresden

C++ for Scientists - Technische Universität Dresden

C++ for Scientists - Technische Universität Dresden

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

5.4. META-TUNING: WRITE YOUR OWN COMPILER OPTIMIZATION 169<br />

};<br />

}<br />

my axpy ftor()(u, v, w, i);<br />

The only difference to fixed-size unrolling is that the indices are relative to an argument —<br />

here i. The operator() is first called with Offset equal to 0, then with 1, 2, . . . Since each call is<br />

inlined the functor call results in one monolithic block of operations without loop control and<br />

function call. Thus, the call of my axpy ftor()(u, v, w, i) per<strong>for</strong>ms the same operations as<br />

one iteration of the first loop in Listing 5.4.<br />

Of course this compilation would end in an infinite loop if we <strong>for</strong>get to specialize it <strong>for</strong> Max:<br />

template <br />

struct my axpy ftor<br />

{<br />

template <br />

void operator()(U& u, const V& v, const W& w, unsigned i) {}<br />

};<br />

Per<strong>for</strong>ming the considered vector operation with different unrollings yields<br />

Compute time unrolled loop is 1.44 µs.<br />

Compute time unrolled loop is 1.15 µs.<br />

Compute time unrolled loop is 1.15 µs.<br />

Compute time unrolled loop is 1.14 µs.<br />

Now we can call this operation <strong>for</strong> any block size we like. On the other hand, it is rather<br />

cumbersome to implement the according functions and functors <strong>for</strong> each vector expression.<br />

There<strong>for</strong>e, we combine this technique now with expression templates.<br />

5.4.5 Tuning an Expression Template<br />

⇒ vector unroll example2.cpp<br />

Let us recall Section 5.3.3. So far, we developed a vector class with expression templates <strong>for</strong><br />

vector sums. In the same manner we can implement the product of a scalar and a vector but<br />

we leave this as exercise and consider expressions with addition only, <strong>for</strong> example:<br />

u = v + v + w<br />

Now we frame this vector operation with a repeting loop and the time measure:<br />

boost::timer t;<br />

<strong>for</strong> (unsigned j= 0; j < rep; j++)<br />

u= v + v + w;<br />

std::cout ≪ ”Compute time is ” ≪ 1000000.0 ∗ t.elapsed() / double(rep) ≪ ” µs.\n”;<br />

This results in:<br />

Compute time is 1.72 µs.<br />

To incorporate meta-tuning into expression templates we only need to modify the actual assignment<br />

because only here a loop is per<strong>for</strong>med. All the other operations (well so far we have

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!