03.12.2012 Views

C++ for Scientists - Technische Universität Dresden

C++ for Scientists - Technische Universität Dresden

C++ for Scientists - Technische Universität Dresden

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

160 CHAPTER 5. META-PROGRAMMING<br />

assign entry 0<br />

assign entry 1<br />

assign entry 2<br />

assign entry 3<br />

In this implementation, we replaced the loop by a recursion — counting on the compiler to<br />

inline the operations (otherwise it would be even slower as the loop) — and made sure that no<br />

loop index is incremented and tested <strong>for</strong> termination. This is only beneficial <strong>for</strong> small loops that<br />

run in L1 cache. Larger loops are dominated by loading the data from memory and the loop<br />

overhead is irrelevant. To the contrary, unrolling operations on very large vectors entirely will<br />

probably decrease the per<strong>for</strong>mance because a lot of instructions need to be loaded and decrease<br />

there<strong>for</strong>e the available bandwidth <strong>for</strong> the data. As mentioned be<strong>for</strong>e, compilers can unroll such<br />

operations by themselves — and hopefully know when it is better not to — and sometimes this<br />

automatic unrolling is even slightly faster then the explicit implementation.<br />

5.4.2 Nested Unrolling<br />

From our experience, compilers usually unroll nested loops. Even a good compiler that can<br />

handle certain nested loops will not be able to optimize every program kernel, in particular those<br />

with heavily templatized programs instantiated with user-defined types. We will demonstrate<br />

here how to unroll nested loops at compile time at the example of matrix vector multiplication.<br />

For this purpose, we introduce a simplistic fixed-size matrix type:<br />

template <br />

class fsize matrix<br />

{<br />

typedef fsize matrix self;<br />

public:<br />

typedef T value type;<br />

BOOST STATIC ASSERT((Rows ∗ Cols > 0));<br />

const static int my rows= Rows, my cols= Cols;<br />

fsize matrix()<br />

{<br />

<strong>for</strong> (int i= 0; i < my rows; ++i)<br />

<strong>for</strong> (int j= 0; j < my cols; ++j)<br />

data[i][j]= T(0);<br />

}<br />

fsize matrix( const self& that ) { ... }<br />

// cannot check column index<br />

const T∗ operator[](int r) const { return data[r]; }<br />

T∗ operator[](int r) { return data[r]; }<br />

mat vec et operator∗(const fsize vector& v) const<br />

{<br />

return mat vec et (∗this, v);<br />

}<br />

private:

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!