C++ for Scientists - Technische Universität Dresden
C++ for Scientists - Technische Universität Dresden C++ for Scientists - Technische Universität Dresden
172 CHAPTER 5. META-PROGRAMMING for (unsigned i= 0; i < sb; i+= BSize) assign()(ref, that, i); for (unsigned i= sb; i < s; i++) ref[i]= that[i]; return ref; } private: V& ref; }; Evaluting the considered vector expressions for some block sizes yields: Compute time unroll(u)= v + v + w is 1.72 µs. Compute time unroll(u)= v + v + w is 1.52 µs. Compute time unroll(u)= v + v + w is 1.36 µs. Compute time unroll(u)= v + v + w is 1.37 µs. Compute time unroll(u)= v + v + w is 1.4 µs. This few benchmarks are consistent with the previous results, i.e. unroll is equal to the canocical implementation and unroll is as fast as the hard-wired unrolling. 5.4.6 Tuning Reduction Operations Reducing on a Single Variable ⇒ reduction unroll example.cpp In the preceding vector operations, the i th entry of each vector was handled independently of any other entry. For reduction operations, they are related by one or more temporary variables. And this temporary variable(s) can become a serious bottle neck. First, we test if a reduction operation, say the discrete L1 norm (also known as Manhattan norm) can be accelerated by the techniques from Section 5.4.4. We implement the one norm function in terms of a functor for the iteration block: template typename Vector::value type inline one norm(const Vector& v) { using std::abs; typename Vector::value type sum(0); unsigned s= size(v), sb= s / BSize ∗ BSize; } for (unsigned i= 0; i < sb; i+= BSize) one norm ftor()(sum, v, i); for (unsigned i= sb; i < s; i++) sum+= abs(v[i]); return sum;
5.4. META-TUNING: WRITE YOUR OWN COMPILER OPTIMIZATION 173 The functor is also implemented in the same manner as before: template struct one norm ftor { template void operator()(S& sum, const V& v, unsigned i) { using std::abs; sum+= abs(v[i+Offset]); one norm ftor()(sum, v, i); } }; template struct one norm ftor { template void operator()(S& sum, const V& v, unsigned i) {} }; The measured run-time behavior behavior is: Compute time one_norm(v) is 7.42 µs. Compute time one_norm(v) is 3.64 µs. Compute time one_norm(v) is 1.9 µs. Compute time one_norm(v) is 1.25 µs. Compute time one_norm(v) is 1.03 µs. This is already a good improvement but maybe we can do better. 23 Reducing on an Array ⇒ reduction unroll array example.cpp When we look at the previous computation, we see that a different entry of v is used in each iteration. But every computation accesses the same temporary variable sum and this limits concurrency. To provide more concurrency, we can use multiple temporaries 24 in an array for instance. The modified function reads then: template typename Vector::value type inline one norm(const Vector& v) { using std::abs; typename Vector::value type sum[BSize]; for (unsigned i= 0; i < BSize; i++) sum[i]= 0; 23 TODO: Test it with gcc 3.4 and MSVC. Speed up in table 24 Strictly speaking, this is not true for every possible scalar type we can think of. The addition of the sum type must be a commutative monoid because we change the evaluation order. This holds of course for all intrinsic numeric types and certainly for almost all user-defined arithmetic types. But one is free to define an addition that is not commutative or not monoidal. In this case our transformation would be wrong. To deal with such exceptions we need semantic concepts which hopefully become part of C ++ in the next years.
- Page 122 and 123: 122 CHAPTER 4. GENERIC PROGRAMMING
- Page 124 and 125: 124 CHAPTER 4. GENERIC PROGRAMMING
- Page 126 and 127: 126 CHAPTER 4. GENERIC PROGRAMMING
- Page 128 and 129: 128 CHAPTER 4. GENERIC PROGRAMMING
- Page 130 and 131: 130 CHAPTER 4. GENERIC PROGRAMMING
- Page 132 and 133: 132 CHAPTER 4. GENERIC PROGRAMMING
- Page 134 and 135: 134 CHAPTER 5. META-PROGRAMMING exp
- Page 136 and 137: 136 CHAPTER 5. META-PROGRAMMING dou
- Page 138 and 139: 138 CHAPTER 5. META-PROGRAMMING We
- Page 140 and 141: 140 CHAPTER 5. META-PROGRAMMING Fir
- Page 142 and 143: 142 CHAPTER 5. META-PROGRAMMING hig
- Page 144 and 145: 144 CHAPTER 5. META-PROGRAMMING The
- Page 146 and 147: 146 CHAPTER 5. META-PROGRAMMING tra
- Page 148 and 149: 148 CHAPTER 5. META-PROGRAMMING tem
- Page 150 and 151: 150 CHAPTER 5. META-PROGRAMMING 5.3
- Page 152 and 153: 152 CHAPTER 5. META-PROGRAMMING •
- Page 154 and 155: 154 CHAPTER 5. META-PROGRAMMING Dis
- Page 156 and 157: 156 CHAPTER 5. META-PROGRAMMING };
- Page 158 and 159: 158 CHAPTER 5. META-PROGRAMMING A s
- Page 160 and 161: 160 CHAPTER 5. META-PROGRAMMING ass
- Page 162 and 163: 162 CHAPTER 5. META-PROGRAMMING num
- Page 164 and 165: 164 CHAPTER 5. META-PROGRAMMING Usi
- Page 166 and 167: 166 CHAPTER 5. META-PROGRAMMING } v
- Page 168 and 169: 168 CHAPTER 5. META-PROGRAMMING The
- Page 170 and 171: 170 CHAPTER 5. META-PROGRAMMING onl
- Page 174 and 175: 174 CHAPTER 5. META-PROGRAMMING } u
- Page 176 and 177: 176 CHAPTER 5. META-PROGRAMMING } r
- Page 178 and 179: 178 CHAPTER 5. META-PROGRAMMING };
- Page 180 and 181: 180 CHAPTER 5. META-PROGRAMMING } t
- Page 182 and 183: 182 CHAPTER 5. META-PROGRAMMING };
- Page 184 and 185: 184 CHAPTER 5. META-PROGRAMMING Com
- Page 186 and 187: 186 CHAPTER 5. META-PROGRAMMING tem
- Page 188 and 189: 188 CHAPTER 6. INHERITANCE { } std:
- Page 190 and 191: 190 CHAPTER 6. INHERITANCE 6.4.1 Ca
- Page 192 and 193: 192 CHAPTER 6. INHERITANCE dbp= sta
- Page 194 and 195: 194 CHAPTER 6. INHERITANCE Our comp
- Page 196 and 197: 196 CHAPTER 6. INHERITANCE Another
- Page 198 and 199: 198 CHAPTER 6. INHERITANCE
- Page 200 and 201: 200 CHAPTER 7. EFFECTIVE PROGRAMMIN
- Page 202 and 203: 202 CHAPTER 7. EFFECTIVE PROGRAMMIN
- Page 204 and 205: 204 CHAPTER 7. EFFECTIVE PROGRAMMIN
- Page 206 and 207: 206 CHAPTER 7. EFFECTIVE PROGRAMMIN
- Page 208 and 209: 208 CHAPTER 7. EFFECTIVE PROGRAMMIN
- Page 210 and 211: 210 CHAPTER 7. EFFECTIVE PROGRAMMIN
- Page 212 and 213: 212 CHAPTER 7. EFFECTIVE PROGRAMMIN
- Page 214 and 215: 214 CHAPTER 7. EFFECTIVE PROGRAMMIN
- Page 216 and 217: 216 CHAPTER 7. EFFECTIVE PROGRAMMIN
- Page 218 and 219: 218 CHAPTER 7. EFFECTIVE PROGRAMMIN
- Page 220 and 221: 220 CHAPTER 7. EFFECTIVE PROGRAMMIN
5.4. META-TUNING: WRITE YOUR OWN COMPILER OPTIMIZATION 173<br />
The functor is also implemented in the same manner as be<strong>for</strong>e:<br />
template <br />
struct one norm ftor<br />
{<br />
template <br />
void operator()(S& sum, const V& v, unsigned i)<br />
{<br />
using std::abs;<br />
sum+= abs(v[i+Offset]);<br />
one norm ftor()(sum, v, i);<br />
}<br />
};<br />
template <br />
struct one norm ftor<br />
{<br />
template <br />
void operator()(S& sum, const V& v, unsigned i) {}<br />
};<br />
The measured run-time behavior behavior is:<br />
Compute time one_norm(v) is 7.42 µs.<br />
Compute time one_norm(v) is 3.64 µs.<br />
Compute time one_norm(v) is 1.9 µs.<br />
Compute time one_norm(v) is 1.25 µs.<br />
Compute time one_norm(v) is 1.03 µs.<br />
This is already a good improvement but maybe we can do better. 23<br />
Reducing on an Array<br />
⇒ reduction unroll array example.cpp<br />
When we look at the previous computation, we see that a different entry of v is used in each<br />
iteration. But every computation accesses the same temporary variable sum and this limits<br />
concurrency. To provide more concurrency, we can use multiple temporaries 24 in an array <strong>for</strong><br />
instance. The modified function reads then:<br />
template <br />
typename Vector::value type<br />
inline one norm(const Vector& v)<br />
{<br />
using std::abs;<br />
typename Vector::value type sum[BSize];<br />
<strong>for</strong> (unsigned i= 0; i < BSize; i++)<br />
sum[i]= 0;<br />
23 TODO: Test it with gcc 3.4 and MSVC. Speed up in table<br />
24 Strictly speaking, this is not true <strong>for</strong> every possible scalar type we can think of. The addition of the sum type<br />
must be a commutative monoid because we change the evaluation order. This holds of course <strong>for</strong> all intrinsic<br />
numeric types and certainly <strong>for</strong> almost all user-defined arithmetic types. But one is free to define an addition<br />
that is not commutative or not monoidal. In this case our trans<strong>for</strong>mation would be wrong. To deal with such<br />
exceptions we need semantic concepts which hopefully become part of C ++ in the next years.