C++ for Scientists - Technische Universität Dresden
C++ for Scientists - Technische Universität Dresden C++ for Scientists - Technische Universität Dresden
180 CHAPTER 5. META-PROGRAMMING } typedef typename Matrix::value type value type; unsigned s= A.num rows(); for (unsigned i= 0; i < s; i++) for (unsigned k= 0; k < s; k++) { value type tmp(0); for (unsigned j= 0; j < s; j++) tmp+= A(i, j) ∗ B(j, k); C(i, k)= tmp; } For this implementation, we write a benchmark function: template void bench(const Matrix& A, const Matrix& B, Matrix& C, const unsigned rep) { boost::timer t1; for (unsigned j= 0; j < rep; j++) mult(A, B, C); double t= t1.elapsed() / double(rep); unsigned s= A.num rows(); } std::cout ≪ ”Compute time mult(A, B, C) is ” ≪ 1000000.0 ∗ t ≪ ” µs. This are ” ≪ s ∗ s ∗ (2∗s − 1) / t / 1000000.0 ≪ ” MFlops.\n”; The run time and performance of our canonical implementation (with 128 × 128 matrices) is: Compute time mult(A, B, C) is 5290 µs. This are 789.777 MFlops. This implementation is our reference regarding performance and results. For the development of the unrolled implementation we go back to 4 × 4 matrices. In contrast to Section 5.4.6 we do not unroll a single reduction but perform multiple reductions in parallel. That means for the three loops to unroll the two outer loops and to replace the body in the inner loop by multiple operations. The latter we achieve as usual with a functor. As in the canonical implementation, the reduction shall not be performed in elements of C but in temporaries. For this purpose we use the class multi tmp from § 5.4.6. For the sake of simplicity we limit ourselves to matrix sizes that are multiples of the unroll parameters. 30 An unrolled matrix multiplication is shown in the following code: template void inline mult(const Matrix& A, const Matrix& B, Matrix& C) { assert(A.num rows() == B.num rows()); // ... assert(A.num rows() % Size0 == 0); // we omitted cleanup here assert(A.num cols() % Size1 == 0); // we omitted cleanup here typedef typename Matrix::value type value type; unsigned s= A.num rows(); 30 A full implementation for arbitrary matrix sizes is realized in MTL4.
5.4. META-TUNING: WRITE YOUR OWN COMPILER OPTIMIZATION 181 } mult block block; for (unsigned i= 0; i < s; i+= Size0) for (unsigned k= 0; k < s; k+= Size1) { multi tmp tmp(value type(0)); for (unsigned j= 0; j < s; j++) block(tmp, A, B, i, j, k); block.update(tmp, C, i, k); } We still owe the reader the implementation of the functor mult block. The techniques are the same as in vector operations but we have to deal with more indices and their respective limits: template struct mult block { typedef mult block next; template void operator()(Tmp& tmp, const Matrix& A, const Matrix& B, unsigned i, unsigned j, unsigned k) { std::cout ≪ ”tmp.” ≪ tmp.bs ≪ ”+= A[” ≪ i + Index0 ≪ ”][” ≪ j ≪ ”] ∗ B[” ≪ j ≪ ”][” ≪ k + Index1 ≪ ”]\n”; tmp.value+= A(i + Index0, j) ∗ B(j, k + Index1); next()(tmp.sub, A, B, i, j, k); } }; template void update(const Tmp& tmp, Matrix& C, unsigned i, unsigned k) { std::cout ≪ ”C[” ≪ i + Index0 ≪ ”][” ≪ k + Index1 ≪ ”]= tmp.” ≪ tmp.bs ≪ ”\n”; C(i + Index0, k + Index1)= tmp.value; next().update(tmp.sub, C, i, k); } template struct mult block { typedef mult block next; template void operator()(Tmp& tmp, const Matrix& A, const Matrix& B, unsigned i, unsigned j, unsigned k) { std::cout ≪ ”tmp.” ≪ tmp.bs ≪ ”+= A[” ≪ i + Index0 ≪ ”][” ≪ j ≪ ”] ∗ B[” ≪ j ≪ ”][” ≪ k + Max1 ≪ ”]\n”; tmp.value+= A(i + Index0, j) ∗ B(j, k + Max1); next()(tmp.sub, A, B, i, j, k); } template void update(const Tmp& tmp, Matrix& C, unsigned i, unsigned k) { std::cout ≪ ”C[” ≪ i + Index0 ≪ ”][” ≪ k + Max1 ≪ ”]= tmp.” ≪ tmp.bs ≪ ”\n”;
- Page 130 and 131: 130 CHAPTER 4. GENERIC PROGRAMMING
- Page 132 and 133: 132 CHAPTER 4. GENERIC PROGRAMMING
- Page 134 and 135: 134 CHAPTER 5. META-PROGRAMMING exp
- Page 136 and 137: 136 CHAPTER 5. META-PROGRAMMING dou
- Page 138 and 139: 138 CHAPTER 5. META-PROGRAMMING We
- Page 140 and 141: 140 CHAPTER 5. META-PROGRAMMING Fir
- Page 142 and 143: 142 CHAPTER 5. META-PROGRAMMING hig
- Page 144 and 145: 144 CHAPTER 5. META-PROGRAMMING The
- Page 146 and 147: 146 CHAPTER 5. META-PROGRAMMING tra
- Page 148 and 149: 148 CHAPTER 5. META-PROGRAMMING tem
- Page 150 and 151: 150 CHAPTER 5. META-PROGRAMMING 5.3
- Page 152 and 153: 152 CHAPTER 5. META-PROGRAMMING •
- Page 154 and 155: 154 CHAPTER 5. META-PROGRAMMING Dis
- Page 156 and 157: 156 CHAPTER 5. META-PROGRAMMING };
- Page 158 and 159: 158 CHAPTER 5. META-PROGRAMMING A s
- Page 160 and 161: 160 CHAPTER 5. META-PROGRAMMING ass
- Page 162 and 163: 162 CHAPTER 5. META-PROGRAMMING num
- Page 164 and 165: 164 CHAPTER 5. META-PROGRAMMING Usi
- Page 166 and 167: 166 CHAPTER 5. META-PROGRAMMING } v
- Page 168 and 169: 168 CHAPTER 5. META-PROGRAMMING The
- Page 170 and 171: 170 CHAPTER 5. META-PROGRAMMING onl
- Page 172 and 173: 172 CHAPTER 5. META-PROGRAMMING for
- Page 174 and 175: 174 CHAPTER 5. META-PROGRAMMING } u
- Page 176 and 177: 176 CHAPTER 5. META-PROGRAMMING } r
- Page 178 and 179: 178 CHAPTER 5. META-PROGRAMMING };
- Page 182 and 183: 182 CHAPTER 5. META-PROGRAMMING };
- Page 184 and 185: 184 CHAPTER 5. META-PROGRAMMING Com
- Page 186 and 187: 186 CHAPTER 5. META-PROGRAMMING tem
- Page 188 and 189: 188 CHAPTER 6. INHERITANCE { } std:
- Page 190 and 191: 190 CHAPTER 6. INHERITANCE 6.4.1 Ca
- Page 192 and 193: 192 CHAPTER 6. INHERITANCE dbp= sta
- Page 194 and 195: 194 CHAPTER 6. INHERITANCE Our comp
- Page 196 and 197: 196 CHAPTER 6. INHERITANCE Another
- Page 198 and 199: 198 CHAPTER 6. INHERITANCE
- Page 200 and 201: 200 CHAPTER 7. EFFECTIVE PROGRAMMIN
- Page 202 and 203: 202 CHAPTER 7. EFFECTIVE PROGRAMMIN
- Page 204 and 205: 204 CHAPTER 7. EFFECTIVE PROGRAMMIN
- Page 206 and 207: 206 CHAPTER 7. EFFECTIVE PROGRAMMIN
- Page 208 and 209: 208 CHAPTER 7. EFFECTIVE PROGRAMMIN
- Page 210 and 211: 210 CHAPTER 7. EFFECTIVE PROGRAMMIN
- Page 212 and 213: 212 CHAPTER 7. EFFECTIVE PROGRAMMIN
- Page 214 and 215: 214 CHAPTER 7. EFFECTIVE PROGRAMMIN
- Page 216 and 217: 216 CHAPTER 7. EFFECTIVE PROGRAMMIN
- Page 218 and 219: 218 CHAPTER 7. EFFECTIVE PROGRAMMIN
- Page 220 and 221: 220 CHAPTER 7. EFFECTIVE PROGRAMMIN
- Page 222 and 223: 222 CHAPTER 7. EFFECTIVE PROGRAMMIN
- Page 225 and 226: Finite World of Computers Chapter 8
- Page 227 and 228: 8.2. MORE NUMBERS AND BASIC STRUCTU
- Page 229 and 230: 8.2. MORE NUMBERS AND BASIC STRUCTU
5.4. META-TUNING: WRITE YOUR OWN COMPILER OPTIMIZATION 181<br />
}<br />
mult block block;<br />
<strong>for</strong> (unsigned i= 0; i < s; i+= Size0)<br />
<strong>for</strong> (unsigned k= 0; k < s; k+= Size1) {<br />
multi tmp tmp(value type(0));<br />
<strong>for</strong> (unsigned j= 0; j < s; j++)<br />
block(tmp, A, B, i, j, k);<br />
block.update(tmp, C, i, k);<br />
}<br />
We still owe the reader the implementation of the functor mult block. The techniques are the<br />
same as in vector operations but we have to deal with more indices and their respective limits:<br />
template <br />
struct mult block<br />
{<br />
typedef mult block next;<br />
template <br />
void operator()(Tmp& tmp, const Matrix& A, const Matrix& B, unsigned i, unsigned j, unsigned k)<br />
{<br />
std::cout ≪ ”tmp.” ≪ tmp.bs ≪ ”+= A[” ≪ i + Index0 ≪ ”][” ≪ j ≪ ”] ∗ B[” ≪ j ≪ ”][” ≪<br />
k + Index1 ≪ ”]\n”;<br />
tmp.value+= A(i + Index0, j) ∗ B(j, k + Index1);<br />
next()(tmp.sub, A, B, i, j, k);<br />
}<br />
};<br />
template <br />
void update(const Tmp& tmp, Matrix& C, unsigned i, unsigned k)<br />
{<br />
std::cout ≪ ”C[” ≪ i + Index0 ≪ ”][” ≪ k + Index1 ≪ ”]= tmp.” ≪ tmp.bs ≪ ”\n”;<br />
C(i + Index0, k + Index1)= tmp.value;<br />
next().update(tmp.sub, C, i, k);<br />
}<br />
template <br />
struct mult block<br />
{<br />
typedef mult block next;<br />
template <br />
void operator()(Tmp& tmp, const Matrix& A, const Matrix& B, unsigned i, unsigned j, unsigned k)<br />
{<br />
std::cout ≪ ”tmp.” ≪ tmp.bs ≪ ”+= A[” ≪ i + Index0 ≪ ”][” ≪ j ≪ ”] ∗ B[” ≪ j ≪ ”][” ≪<br />
k + Max1 ≪ ”]\n”;<br />
tmp.value+= A(i + Index0, j) ∗ B(j, k + Max1);<br />
next()(tmp.sub, A, B, i, j, k);<br />
}<br />
template <br />
void update(const Tmp& tmp, Matrix& C, unsigned i, unsigned k)<br />
{<br />
std::cout ≪ ”C[” ≪ i + Index0 ≪ ”][” ≪ k + Max1 ≪ ”]= tmp.” ≪ tmp.bs ≪ ”\n”;