C++ for Scientists - Technische Universität Dresden
C++ for Scientists - Technische Universität Dresden C++ for Scientists - Technische Universität Dresden
160 CHAPTER 5. META-PROGRAMMING assign entry 0 assign entry 1 assign entry 2 assign entry 3 In this implementation, we replaced the loop by a recursion — counting on the compiler to inline the operations (otherwise it would be even slower as the loop) — and made sure that no loop index is incremented and tested for termination. This is only beneficial for small loops that run in L1 cache. Larger loops are dominated by loading the data from memory and the loop overhead is irrelevant. To the contrary, unrolling operations on very large vectors entirely will probably decrease the performance because a lot of instructions need to be loaded and decrease therefore the available bandwidth for the data. As mentioned before, compilers can unroll such operations by themselves — and hopefully know when it is better not to — and sometimes this automatic unrolling is even slightly faster then the explicit implementation. 5.4.2 Nested Unrolling From our experience, compilers usually unroll nested loops. Even a good compiler that can handle certain nested loops will not be able to optimize every program kernel, in particular those with heavily templatized programs instantiated with user-defined types. We will demonstrate here how to unroll nested loops at compile time at the example of matrix vector multiplication. For this purpose, we introduce a simplistic fixed-size matrix type: template class fsize matrix { typedef fsize matrix self; public: typedef T value type; BOOST STATIC ASSERT((Rows ∗ Cols > 0)); const static int my rows= Rows, my cols= Cols; fsize matrix() { for (int i= 0; i < my rows; ++i) for (int j= 0; j < my cols; ++j) data[i][j]= T(0); } fsize matrix( const self& that ) { ... } // cannot check column index const T∗ operator[](int r) const { return data[r]; } T∗ operator[](int r) { return data[r]; } mat vec et operator∗(const fsize vector& v) const { return mat vec et (∗this, v); } private:
5.4. META-TUNING: WRITE YOUR OWN COMPILER OPTIMIZATION 161 }; T data[Rows][Cols]; The bracket operator returns a pointer for the sake of simplicity but a good implementation should return a proxy that allows for checking the column index. The multiplication with a vector is realized by means of an expression template for not copying the result vector. Then the vector assigment needs a specialization for the expression template 17 template class fsize vector { template self& operator=( const mat vec et& that ) { typedef mat vec et et; fsize mat vec mult()(that.A, that.v, ∗this); return ∗this; } }; The functor fsize mat vec mult must now compute the matrix vector product on the three arguments. The general implementation of the functor reads: template struct fsize mat vec mult { template void operator()(const Matrix& A, const VecIn& v in, VecOut& v out) { fsize mat vec mult()(A, v in, v out); v out[Rows]+= A[Rows][Cols] ∗ v in[Cols]; } }; Again, the functor is only templatized on the sizes and the container types are deduced. The operator assumes that all smaller column indices are already handled and we can increment v out[Rows] by A[Rows][Cols] ∗ v in[Cols]. In particular, we assume that the first operation on v out[Rows] initializes it. Thus we need a (partial) specialization for Cols = 0: template struct fsize mat vec mult { template void operator()(const Matrix& A, const VecIn& v in, VecOut& v out) { fsize mat vec mult()(A, v in, v out); v out[Rows]= A[Rows][0] ∗ v in[0]; } }; The careful reader noticed the substitution of += by =. We also notice that we have to call the computation for the preceeding row with all columns and inductively for all smaller rows. The 17 A better solution would be implementing all assignments with a functor and specialize the functor because partial template specialization of functions does not always work as expected.
- Page 110 and 111: 110 CHAPTER 4. GENERIC PROGRAMMING
- Page 112 and 113: 112 CHAPTER 4. GENERIC PROGRAMMING
- Page 114 and 115: 114 CHAPTER 4. GENERIC PROGRAMMING
- Page 116 and 117: 116 CHAPTER 4. GENERIC PROGRAMMING
- Page 118 and 119: 118 CHAPTER 4. GENERIC PROGRAMMING
- Page 120 and 121: 120 CHAPTER 4. GENERIC PROGRAMMING
- Page 122 and 123: 122 CHAPTER 4. GENERIC PROGRAMMING
- Page 124 and 125: 124 CHAPTER 4. GENERIC PROGRAMMING
- Page 126 and 127: 126 CHAPTER 4. GENERIC PROGRAMMING
- Page 128 and 129: 128 CHAPTER 4. GENERIC PROGRAMMING
- Page 130 and 131: 130 CHAPTER 4. GENERIC PROGRAMMING
- Page 132 and 133: 132 CHAPTER 4. GENERIC PROGRAMMING
- Page 134 and 135: 134 CHAPTER 5. META-PROGRAMMING exp
- Page 136 and 137: 136 CHAPTER 5. META-PROGRAMMING dou
- Page 138 and 139: 138 CHAPTER 5. META-PROGRAMMING We
- Page 140 and 141: 140 CHAPTER 5. META-PROGRAMMING Fir
- Page 142 and 143: 142 CHAPTER 5. META-PROGRAMMING hig
- Page 144 and 145: 144 CHAPTER 5. META-PROGRAMMING The
- Page 146 and 147: 146 CHAPTER 5. META-PROGRAMMING tra
- Page 148 and 149: 148 CHAPTER 5. META-PROGRAMMING tem
- Page 150 and 151: 150 CHAPTER 5. META-PROGRAMMING 5.3
- Page 152 and 153: 152 CHAPTER 5. META-PROGRAMMING •
- Page 154 and 155: 154 CHAPTER 5. META-PROGRAMMING Dis
- Page 156 and 157: 156 CHAPTER 5. META-PROGRAMMING };
- Page 158 and 159: 158 CHAPTER 5. META-PROGRAMMING A s
- Page 162 and 163: 162 CHAPTER 5. META-PROGRAMMING num
- Page 164 and 165: 164 CHAPTER 5. META-PROGRAMMING Usi
- Page 166 and 167: 166 CHAPTER 5. META-PROGRAMMING } v
- Page 168 and 169: 168 CHAPTER 5. META-PROGRAMMING The
- Page 170 and 171: 170 CHAPTER 5. META-PROGRAMMING onl
- Page 172 and 173: 172 CHAPTER 5. META-PROGRAMMING for
- Page 174 and 175: 174 CHAPTER 5. META-PROGRAMMING } u
- Page 176 and 177: 176 CHAPTER 5. META-PROGRAMMING } r
- Page 178 and 179: 178 CHAPTER 5. META-PROGRAMMING };
- Page 180 and 181: 180 CHAPTER 5. META-PROGRAMMING } t
- Page 182 and 183: 182 CHAPTER 5. META-PROGRAMMING };
- Page 184 and 185: 184 CHAPTER 5. META-PROGRAMMING Com
- Page 186 and 187: 186 CHAPTER 5. META-PROGRAMMING tem
- Page 188 and 189: 188 CHAPTER 6. INHERITANCE { } std:
- Page 190 and 191: 190 CHAPTER 6. INHERITANCE 6.4.1 Ca
- Page 192 and 193: 192 CHAPTER 6. INHERITANCE dbp= sta
- Page 194 and 195: 194 CHAPTER 6. INHERITANCE Our comp
- Page 196 and 197: 196 CHAPTER 6. INHERITANCE Another
- Page 198 and 199: 198 CHAPTER 6. INHERITANCE
- Page 200 and 201: 200 CHAPTER 7. EFFECTIVE PROGRAMMIN
- Page 202 and 203: 202 CHAPTER 7. EFFECTIVE PROGRAMMIN
- Page 204 and 205: 204 CHAPTER 7. EFFECTIVE PROGRAMMIN
- Page 206 and 207: 206 CHAPTER 7. EFFECTIVE PROGRAMMIN
- Page 208 and 209: 208 CHAPTER 7. EFFECTIVE PROGRAMMIN
5.4. META-TUNING: WRITE YOUR OWN COMPILER OPTIMIZATION 161<br />
};<br />
T data[Rows][Cols];<br />
The bracket operator returns a pointer <strong>for</strong> the sake of simplicity but a good implementation<br />
should return a proxy that allows <strong>for</strong> checking the column index. The multiplication with a<br />
vector is realized by means of an expression template <strong>for</strong> not copying the result vector.<br />
Then the vector assigment needs a specialization <strong>for</strong> the expression template 17<br />
template <br />
class fsize vector<br />
{<br />
template <br />
self& operator=( const mat vec et& that )<br />
{<br />
typedef mat vec et et;<br />
fsize mat vec mult()(that.A, that.v, ∗this);<br />
return ∗this;<br />
}<br />
};<br />
The functor fsize mat vec mult must now compute the matrix vector product on the three arguments.<br />
The general implementation of the functor reads:<br />
template <br />
struct fsize mat vec mult<br />
{<br />
template <br />
void operator()(const Matrix& A, const VecIn& v in, VecOut& v out)<br />
{<br />
fsize mat vec mult()(A, v in, v out);<br />
v out[Rows]+= A[Rows][Cols] ∗ v in[Cols];<br />
}<br />
};<br />
Again, the functor is only templatized on the sizes and the container types are deduced. The<br />
operator assumes that all smaller column indices are already handled and we can increment<br />
v out[Rows] by A[Rows][Cols] ∗ v in[Cols]. In particular, we assume that the first operation on<br />
v out[Rows] initializes it. Thus we need a (partial) specialization <strong>for</strong> Cols = 0:<br />
template <br />
struct fsize mat vec mult<br />
{<br />
template <br />
void operator()(const Matrix& A, const VecIn& v in, VecOut& v out)<br />
{<br />
fsize mat vec mult()(A, v in, v out);<br />
v out[Rows]= A[Rows][0] ∗ v in[0];<br />
}<br />
};<br />
The careful reader noticed the substitution of += by =. We also notice that we have to call the<br />
computation <strong>for</strong> the preceeding row with all columns and inductively <strong>for</strong> all smaller rows. The<br />
17 A better solution would be implementing all assignments with a functor and specialize the functor because<br />
partial template specialization of functions does not always work as expected.