03.12.2012 Views

C++ for Scientists - Technische Universität Dresden

C++ for Scientists - Technische Universität Dresden

C++ for Scientists - Technische Universität Dresden

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

5.4. META-TUNING: WRITE YOUR OWN COMPILER OPTIMIZATION 183<br />

This log shows that C[1][0] and C[1][1] are computed alternately so that it can be per<strong>for</strong>med in<br />

parallel on a super-scalar computer. One can also verify that<br />

cik =<br />

3�<br />

j=0<br />

aijbjk.<br />

Printing C will also show the same result as <strong>for</strong> the canonical matrix multiplication.<br />

The implementation above can be simplified. The first functor specialization is only different<br />

to the general functor in the way how the indices are incrememted. We can factor this out with<br />

an additional loop class:<br />

template <br />

struct loop2<br />

{<br />

static const unsigned next index0= Index0, next index1= Index1 + 1;<br />

};<br />

template <br />

struct loop2<br />

{<br />

static const unsigned next index0= Index0 + 1, next index1= 0;<br />

};<br />

Such a general class has a high potential of reuse. With this class we can fuse the funtor<br />

template and the first specialization:<br />

template <br />

struct mult block<br />

{<br />

typedef loop2 l;<br />

typedef mult block next;<br />

template <br />

void operator()(Tmp& tmp, const Matrix& A, const Matrix& B, unsigned i, unsigned j, unsigned k)<br />

{<br />

std::cout ≪ ”tmp.” ≪ tmp.bs ≪ ”+= A[” ≪ i + Index0 ≪ ”][” ≪ j ≪ ”] ∗ B[” ≪ j ≪ ”][” ≪<br />

k + Index1 ≪ ”]\n”;<br />

tmp.value+= A(i + Index0, j) ∗ B(j, k + Index1);<br />

next()(tmp.sub, A, B, i, j, k);<br />

}<br />

};<br />

template <br />

void update(const Tmp& tmp, Matrix& C, unsigned i, unsigned k)<br />

{<br />

std::cout ≪ ”C[” ≪ i + Index0 ≪ ”][” ≪ k + Index1 ≪ ”]= tmp.” ≪ tmp.bs ≪ ”\n”;<br />

C(i + Index0, k + Index1)= tmp.value;<br />

next().update(tmp.sub, C, i, k);<br />

}<br />

The other specialization remains unaltered.<br />

Last but not least we like to see impact of our not-so-simple matrix product. The benchmark<br />

yielded on our test machine:

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!