C++ for Scientists - Technische Universität Dresden
C++ for Scientists - Technische Universität Dresden C++ for Scientists - Technische Universität Dresden
162 CHAPTER 5. META-PROGRAMMING number of columns in the matrix is taken from an internal definition in the matrix type for the sake of simplicity. Passing this as extra template argument or taking a type traits would have been more general because we are now limited to types where my cols is defined in the class. We still need a (full) specialization to terminate the recursion: template struct fsize mat vec mult { template void operator()(const Matrix& A, const VecIn& v in, VecOut& v out) { v out[0]= A[0][0] ∗ v in[0]; } }; With the inlining, our program will execute the operation w= A ∗ v for vectors of size 4 as: w[0]= A[0][0] ∗ v[0]; w[0]+= A[0][1] ∗ v[1]; w[0]+= A[0][2] ∗ v[2]; w[0]+= A[0][3] ∗ v[3]; w[1]= A[1][0] ∗ v[0]; w[1]+= A[1][1] ∗ v[1]; w[1]+= A[1][2] ∗ v[2]; w[1]+= A[1][3] ∗ v[3]; w[2]= A[2][0] ∗ v[0]; w[2]+= A[2][1] ∗ v[1]; w[2]+= A[2][2] ∗ v[2]; w[2]+= A[2][3] ∗ v[3]; w[3]= A[3][0] ∗ v[0]; w[3]+= A[3][1] ∗ v[1]; w[3]+= A[3][2] ∗ v[2]; w[3]+= A[3][3] ∗ v[3]; Our tests have shown that such an implementation is really faster than the compiler optimization on loops. 18 Increasing Concurrency A disadvantage of the preceeding implementation is that all operations on an entry of the target vector are performed in one sweep. Therefore, the second operation must wait for the first the third for the second on so on. The fifth operation can be done in parallel with the forth, the ninth with the eighth but this is not satisfying. We like to have more concurrency in our program that enables parallel pipelines in superscalar processors. Again, we can twiddle our thumbs and hope that the compiler will reorder the statements or take it in our hands. More concurrency is provided by the following operation sequence: w[0]= A[0][0] ∗ v[0]; w[1]= A[1][0] ∗ v[0]; w[2]= A[2][0] ∗ v[0]; w[3]= A[3][0] ∗ v[0]; w[0]+= A[0][1] ∗ v[1]; 18 TODO: Give numbers
5.4. META-TUNING: WRITE YOUR OWN COMPILER OPTIMIZATION 163 w[1]+= A[1][1] ∗ v[1]; w[2]+= A[2][1] ∗ v[1]; w[3]+= A[3][1] ∗ v[1]; w[0]+= A[0][2] ∗ v[2]; w[1]+= A[1][2] ∗ v[2]; w[2]+= A[2][2] ∗ v[2]; w[3]+= A[3][2] ∗ v[2]; w[0]+= A[0][3] ∗ v[3]; w[1]+= A[1][3] ∗ v[3]; w[2]+= A[2][3] ∗ v[3]; w[3]+= A[3][3] ∗ v[3]; We only need to reorganize our functor. The general template reads now: template struct fsize mat vec mult cm { template void operator()(const Matrix& A, const VecIn& v in, VecOut& v out) { fsize mat vec mult cm()(A, v in, v out); v out[Rows]+= A[Rows][Cols] ∗ v in[Cols]; } }; Now, we need a partial specialization for row 0 to go the next column: template struct fsize mat vec mult cm { template void operator()(const Matrix& A, const VecIn& v in, VecOut& v out) { fsize mat vec mult cm()(A, v in, v out); v out[0]+= A[0][Cols] ∗ v in[Cols]; } }; The partial specialization for column 0 is also needed to initialize the entry of the output vector: template struct fsize mat vec mult cm { template void operator()(const Matrix& A, const VecIn& v in, VecOut& v out) { fsize mat vec mult cm()(A, v in, v out); v out[Rows]= A[Rows][0] ∗ v in[0]; } }; Finally, we still need a specialization for row and column 0 to terminate the recursion. This can be reused from the previous functor: template struct fsize mat vec mult cm : fsize mat vec mult {};
- Page 112 and 113: 112 CHAPTER 4. GENERIC PROGRAMMING
- Page 114 and 115: 114 CHAPTER 4. GENERIC PROGRAMMING
- Page 116 and 117: 116 CHAPTER 4. GENERIC PROGRAMMING
- Page 118 and 119: 118 CHAPTER 4. GENERIC PROGRAMMING
- Page 120 and 121: 120 CHAPTER 4. GENERIC PROGRAMMING
- Page 122 and 123: 122 CHAPTER 4. GENERIC PROGRAMMING
- Page 124 and 125: 124 CHAPTER 4. GENERIC PROGRAMMING
- Page 126 and 127: 126 CHAPTER 4. GENERIC PROGRAMMING
- Page 128 and 129: 128 CHAPTER 4. GENERIC PROGRAMMING
- Page 130 and 131: 130 CHAPTER 4. GENERIC PROGRAMMING
- Page 132 and 133: 132 CHAPTER 4. GENERIC PROGRAMMING
- Page 134 and 135: 134 CHAPTER 5. META-PROGRAMMING exp
- Page 136 and 137: 136 CHAPTER 5. META-PROGRAMMING dou
- Page 138 and 139: 138 CHAPTER 5. META-PROGRAMMING We
- Page 140 and 141: 140 CHAPTER 5. META-PROGRAMMING Fir
- Page 142 and 143: 142 CHAPTER 5. META-PROGRAMMING hig
- Page 144 and 145: 144 CHAPTER 5. META-PROGRAMMING The
- Page 146 and 147: 146 CHAPTER 5. META-PROGRAMMING tra
- Page 148 and 149: 148 CHAPTER 5. META-PROGRAMMING tem
- Page 150 and 151: 150 CHAPTER 5. META-PROGRAMMING 5.3
- Page 152 and 153: 152 CHAPTER 5. META-PROGRAMMING •
- Page 154 and 155: 154 CHAPTER 5. META-PROGRAMMING Dis
- Page 156 and 157: 156 CHAPTER 5. META-PROGRAMMING };
- Page 158 and 159: 158 CHAPTER 5. META-PROGRAMMING A s
- Page 160 and 161: 160 CHAPTER 5. META-PROGRAMMING ass
- Page 164 and 165: 164 CHAPTER 5. META-PROGRAMMING Usi
- Page 166 and 167: 166 CHAPTER 5. META-PROGRAMMING } v
- Page 168 and 169: 168 CHAPTER 5. META-PROGRAMMING The
- Page 170 and 171: 170 CHAPTER 5. META-PROGRAMMING onl
- Page 172 and 173: 172 CHAPTER 5. META-PROGRAMMING for
- Page 174 and 175: 174 CHAPTER 5. META-PROGRAMMING } u
- Page 176 and 177: 176 CHAPTER 5. META-PROGRAMMING } r
- Page 178 and 179: 178 CHAPTER 5. META-PROGRAMMING };
- Page 180 and 181: 180 CHAPTER 5. META-PROGRAMMING } t
- Page 182 and 183: 182 CHAPTER 5. META-PROGRAMMING };
- Page 184 and 185: 184 CHAPTER 5. META-PROGRAMMING Com
- Page 186 and 187: 186 CHAPTER 5. META-PROGRAMMING tem
- Page 188 and 189: 188 CHAPTER 6. INHERITANCE { } std:
- Page 190 and 191: 190 CHAPTER 6. INHERITANCE 6.4.1 Ca
- Page 192 and 193: 192 CHAPTER 6. INHERITANCE dbp= sta
- Page 194 and 195: 194 CHAPTER 6. INHERITANCE Our comp
- Page 196 and 197: 196 CHAPTER 6. INHERITANCE Another
- Page 198 and 199: 198 CHAPTER 6. INHERITANCE
- Page 200 and 201: 200 CHAPTER 7. EFFECTIVE PROGRAMMIN
- Page 202 and 203: 202 CHAPTER 7. EFFECTIVE PROGRAMMIN
- Page 204 and 205: 204 CHAPTER 7. EFFECTIVE PROGRAMMIN
- Page 206 and 207: 206 CHAPTER 7. EFFECTIVE PROGRAMMIN
- Page 208 and 209: 208 CHAPTER 7. EFFECTIVE PROGRAMMIN
- Page 210 and 211: 210 CHAPTER 7. EFFECTIVE PROGRAMMIN
162 CHAPTER 5. META-PROGRAMMING<br />
number of columns in the matrix is taken from an internal definition in the matrix type <strong>for</strong> the<br />
sake of simplicity. Passing this as extra template argument or taking a type traits would have<br />
been more general because we are now limited to types where my cols is defined in the class.<br />
We still need a (full) specialization to terminate the recursion:<br />
template <br />
struct fsize mat vec mult<br />
{<br />
template <br />
void operator()(const Matrix& A, const VecIn& v in, VecOut& v out)<br />
{<br />
v out[0]= A[0][0] ∗ v in[0];<br />
}<br />
};<br />
With the inlining, our program will execute the operation w= A ∗ v <strong>for</strong> vectors of size 4 as:<br />
w[0]= A[0][0] ∗ v[0];<br />
w[0]+= A[0][1] ∗ v[1];<br />
w[0]+= A[0][2] ∗ v[2];<br />
w[0]+= A[0][3] ∗ v[3];<br />
w[1]= A[1][0] ∗ v[0];<br />
w[1]+= A[1][1] ∗ v[1];<br />
w[1]+= A[1][2] ∗ v[2];<br />
w[1]+= A[1][3] ∗ v[3];<br />
w[2]= A[2][0] ∗ v[0];<br />
w[2]+= A[2][1] ∗ v[1];<br />
w[2]+= A[2][2] ∗ v[2];<br />
w[2]+= A[2][3] ∗ v[3];<br />
w[3]= A[3][0] ∗ v[0];<br />
w[3]+= A[3][1] ∗ v[1];<br />
w[3]+= A[3][2] ∗ v[2];<br />
w[3]+= A[3][3] ∗ v[3];<br />
Our tests have shown that such an implementation is really faster than the compiler optimization<br />
on loops. 18<br />
Increasing Concurrency<br />
A disadvantage of the preceeding implementation is that all operations on an entry of the target<br />
vector are per<strong>for</strong>med in one sweep. There<strong>for</strong>e, the second operation must wait <strong>for</strong> the first the<br />
third <strong>for</strong> the second on so on. The fifth operation can be done in parallel with the <strong>for</strong>th,<br />
the ninth with the eighth but this is not satisfying. We like to have more concurrency in our<br />
program that enables parallel pipelines in superscalar processors. Again, we can twiddle our<br />
thumbs and hope that the compiler will reorder the statements or take it in our hands. More<br />
concurrency is provided by the following operation sequence:<br />
w[0]= A[0][0] ∗ v[0];<br />
w[1]= A[1][0] ∗ v[0];<br />
w[2]= A[2][0] ∗ v[0];<br />
w[3]= A[3][0] ∗ v[0];<br />
w[0]+= A[0][1] ∗ v[1];<br />
18 TODO: Give numbers