03.12.2012 Views

C++ for Scientists - Technische Universität Dresden

C++ for Scientists - Technische Universität Dresden

C++ for Scientists - Technische Universität Dresden

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

164 CHAPTER 5. META-PROGRAMMING<br />

Using Registers<br />

Another feature of modern processors one should keep in mind: cache coherency. Processors<br />

are nowadays designed to share memory while pertaining consistency in their caches. As a<br />

result, every time we write into data structure in memory like our vector w a cache invalidation<br />

signal is sent on the bus. Even if no other processor is present. Un<strong>for</strong>tunately, this slows down<br />

computation perceivably (from our experience).<br />

Fortunately, this can be avoided in many cases in a rather simple way by introducing a temporary<br />

in a function that resides in register(s) if the type allows. We can rely on the compiler to<br />

decide reasonably the location of temporaries.<br />

This implementation requires two classes: one <strong>for</strong> the outer and one <strong>for</strong> the inner loop. Let us<br />

start with the outer loop:<br />

1 template <br />

2 struct fsize mat vec mult reg<br />

3 {<br />

4 template <br />

5 void operator()(const Matrix& A, const VecIn& v in, VecOut& v out)<br />

6 {<br />

7 fsize mat vec mult reg()(A, v in, v out);<br />

8<br />

9 typename VecOut::value type tmp;<br />

10 fsize mat vec mult aux()(A, v in, tmp);<br />

11 v out[Rows]= tmp;<br />

12 }<br />

13 };<br />

We assume that fsize mat vec mult aux is defined or declared be<strong>for</strong>e this class. The first statement<br />

in line 7 calls the computations on the preceeding rows. A temporary is defined in line 9 with<br />

the hope that it will be located in a register. Then we call the computation within this row. The<br />

temporary is passed as reference to an inline function so that the summation will be per<strong>for</strong>med<br />

in a register. In line 10 we write the result back to v out. This still causes the invalidation signal<br />

on the bus but only once <strong>for</strong> each entry.<br />

The functor must be specialized <strong>for</strong> row 0 to avoid infinite loops:<br />

template <br />

struct fsize mat vec mult reg<br />

{<br />

template <br />

void operator()(const Matrix& A, const VecIn& v in, VecOut& v out)<br />

{<br />

typename VecOut::value type tmp;<br />

fsize mat vec mult aux()(A, v in, tmp);<br />

v out[0]= tmp;<br />

}<br />

};<br />

Within each row we iterate over the columns and increment the temporary (in the register<br />

hopefully):<br />

template <br />

struct fsize mat vec mult aux

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!