C++ for Scientists - Technische Universität Dresden

More documents

Recommendations

Info

178 CHAPTER 5. META-PROGRAMMING }; } template struct one norm ftor { template void operator()(S& s0, S& s1, S& s2, S& s3, S& s4, S& s5, S& s6, S& s7, const V& v, unsigned i) {} }; The according one norm function based on this functor is straightforward: template typename Vector::value type inline one norm(const Vector& v) { using std::abs; typename Vector::value type s0(0), s1(0), s2(0), s3(0), s4(0), s5(0), s6(0), s7(0); unsigned s= size(v), sb= s / BSize ∗ BSize; } for (unsigned i= 0; i < sb; i+= BSize) one norm ftor()(s0, s1, s2, s3, s4, s5, s6, s7, v, i); s0+= s1 + s2 + s3 + s4 + s5 + s6 + s7; for (unsigned i= sb; i < s; i++) s0+= abs(v[i]); return s0; A slight disadvantage is that all registers must be accumulated after the first iteration no matter how small BSize is and how short the vector. A great advantage of the rotation is that BSize is not limited to the number of temporary variables in such accumulations. If BSize is larger then some or all variables are used multiple times without corrupting the result. The number of temporaries is nonetheless a limiting factor for the concurrency. The execution of this implementation durates on the test machine: Compute time one_norm(v) is 6.77 µs. Compute time one_norm(v) is 1.13 µs. Compute time one_norm(v) is 0.71 µs. Compute time one_norm(v) is 0.75 µs. Compute time one_norm(v) is 1.07 µs. This is comparable with the nested class (in this environment). Résumé on Reduction Tuning The goal of this section was not to determine the ultimately tuned reduction implementation for superscalar processors. 27 The main ambition of this section, in fact of the whole book, is to demonstrate the diversity of implementation opportunities. With the enormous expressiveness 27 In the presence of the new GPU cards with hundreds of cores and millions of threads, the fight for this little concurrency is not so impressive. Nonetheless, we will still need performance tuning on single-core and “few-core”
5.4. META-TUNING: WRITE YOUR OWN COMPILER OPTIMIZATION 179 of C ++ one can use (or abuse) the compiler to generate the most efficient version without rewriting the program sources, as one would need in C or Fortran. The power of internal code generation with the C ++ compiler only makes external code generation as in ATLAS 28 unnecessary. In ATLAS, functions are written in a domain specific language and C programs 29 in slight variations are generated with a tool and compared regarding performance. The techniques presented here empower us to generate binaries equivalent to those variations by just using a C ++ compiler. Thus, we can tune our programs by changing template arguments or constants (that might be set platform-dependently). 5.4.7 Tuning Nested Loops ⇒ matrix unroll example.cpp The most used (and abused) example in performance discussions is dense matrix multiplication. We do not claim to compete with hand-tuned assembler codes but we show the power of metaprogramming to generate code variations from a single implementation. As starting point we use a templatized implementation of matrix class from Section 3.7.4. We begin our implementation with a simple test case: int main() { const unsigned s= 4; // s= 4 for testing and 128 for timing matrix A(s, s), B(s, s), C(s, s); } for (unsigned i= 0; i < s; i++) for (unsigned j= 0; j < s; j++) { A(i, j)= 100.0 ∗ i + j; B(i, j)= 200.0 ∗ i + j; } mult(A, B, C); std::cout ≪ ”C is ” ≪ C ≪ ’\n’; A matrix multiplication is easily implemented with three nested loops. One of the 6 possible nestings is a dot-product-like calculation of each entry from C: cik = Ai · B k where Ai is the i th row of A and Bk the k th column of B. We use a temporary in the innermost loop to decrease the cache-invalidation overhead of writing to C’s elements in each operation: template void inline mult(const Matrix& A, const Matrix& B, Matrix& C) { assert(A.num rows() == B.num rows()); // ... machines at least for some years since not everybody has GPU card for numerics and not every algorithm is already successfully ported (e.g. incomplete LU on arbitrary sparse matrices). By the time of this writing their is not even support for std::complex. 28 http://math-atlas.sourceforge.net/ 29 In some cases the C programs contain assembler snippets for a given platform in order to achieve performance close to peak.
Page 1 and 2:
Technische Universität Dresden Fak
Page 3 and 4:
Contents I Understanding C++ 7 Intr
Page 5 and 6:
CONTENTS 5 10.5 Unix and Linux . .
Page 7:
Part I Understanding C ++ 7
Page 10 and 11:
10 C ++ was not a reliable computer
Page 12 and 13:
12 CHAPTER 1. GOOD AND BAD SCIENTIF
Page 14 and 15:
Page 16 and 17:
Page 18 and 19:
Page 20 and 21:
20 CHAPTER 2. C++ BASICS • std::c
Page 22 and 23:
22 CHAPTER 2. C++ BASICS In the fir
Page 24 and 25:
24 CHAPTER 2. C++ BASICS int main (
Page 26 and 27:
26 CHAPTER 2. C++ BASICS Operator A
Page 28 and 29:
28 CHAPTER 2. C++ BASICS The bitwis
Page 30 and 31:
30 CHAPTER 2. C++ BASICS cast (type
Page 32 and 33:
32 CHAPTER 2. C++ BASICS complicate
Page 34 and 35:
34 CHAPTER 2. C++ BASICS } else if
Page 36 and 37:
36 CHAPTER 2. C++ BASICS eps/= 2.0;
Page 38 and 39:
38 CHAPTER 2. C++ BASICS for (...;
Page 40 and 41:
40 CHAPTER 2. C++ BASICS 2.6.1 Inli
Page 42 and 43:
42 CHAPTER 2. C++ BASICS To make su
Page 44 and 45:
44 CHAPTER 2. C++ BASICS float divi
Page 46 and 47:
46 CHAPTER 2. C++ BASICS The first
Page 48 and 49:
48 CHAPTER 2. C++ BASICS #ifndef at
Page 50 and 51:
50 CHAPTER 2. C++ BASICS float A[7]
Page 52 and 53:
52 CHAPTER 2. C++ BASICS Encapsulat
Page 54 and 55:
54 CHAPTER 2. C++ BASICS As a pract
Page 56 and 57:
56 CHAPTER 2. C++ BASICS A function
Page 58 and 59:
58 CHAPTER 2. C++ BASICS Now that t
Page 60 and 61:
60 CHAPTER 2. C++ BASICS we need sm
Page 62 and 63:
62 CHAPTER 2. C++ BASICS 2.12 Exerc
Page 64 and 65:
64 CHAPTER 2. C++ BASICS 2.13 Opera
Page 66 and 67:
66 CHAPTER 3. CLASSES apply symm bl
Page 68 and 69:
68 CHAPTER 3. CLASSES int main() {
Page 70 and 71:
70 CHAPTER 3. CLASSES class solver
Page 72 and 73:
72 CHAPTER 3. CLASSES • Compilers
Page 74 and 75:
74 CHAPTER 3. CLASSES a real number
Page 76 and 77:
76 CHAPTER 3. CLASSES } return ∗t
Page 78 and 79:
78 CHAPTER 3. CLASSES This mechanis
Page 80 and 81:
80 CHAPTER 3. CLASSES One could not
Page 82 and 83:
82 CHAPTER 3. CLASSES class matrix
Page 84 and 85:
84 CHAPTER 3. CLASSES Approach 3: R
Page 86 and 87:
86 CHAPTER 3. CLASSES of the decrem
Page 88 and 89:
88 CHAPTER 3. CLASSES There are two
Page 90 and 91:
90 CHAPTER 4. GENERIC PROGRAMMING }
Page 92 and 93:
92 CHAPTER 4. GENERIC PROGRAMMING c
Page 94 and 95:
94 CHAPTER 4. GENERIC PROGRAMMING A
Page 96 and 97:
96 CHAPTER 4. GENERIC PROGRAMMING v
Page 98 and 99:
98 CHAPTER 4. GENERIC PROGRAMMING c
Page 100 and 101:
100 CHAPTER 4. GENERIC PROGRAMMING
Page 102 and 103:
Page 104 and 105:
Page 106 and 107:
Page 108 and 109:
Page 110 and 111:
Page 112 and 113:
Page 114 and 115:
Page 116 and 117:
Page 118 and 119:
Page 120 and 121:
Page 122 and 123:
Page 124 and 125:
Page 126 and 127:
Page 128 and 129: 128 CHAPTER 4. GENERIC PROGRAMMING
Page 134 and 135: 134 CHAPTER 5. META-PROGRAMMING exp
Page 136 and 137: 136 CHAPTER 5. META-PROGRAMMING dou
Page 138 and 139: 138 CHAPTER 5. META-PROGRAMMING We
Page 140 and 141: 140 CHAPTER 5. META-PROGRAMMING Fir
Page 142 and 143: 142 CHAPTER 5. META-PROGRAMMING hig
Page 144 and 145: 144 CHAPTER 5. META-PROGRAMMING The
Page 146 and 147: 146 CHAPTER 5. META-PROGRAMMING tra
Page 148 and 149: 148 CHAPTER 5. META-PROGRAMMING tem
Page 150 and 151: 150 CHAPTER 5. META-PROGRAMMING 5.3
Page 152 and 153: 152 CHAPTER 5. META-PROGRAMMING •
Page 154 and 155: 154 CHAPTER 5. META-PROGRAMMING Dis
Page 156 and 157: 156 CHAPTER 5. META-PROGRAMMING };
Page 158 and 159: 158 CHAPTER 5. META-PROGRAMMING A s
Page 160 and 161: 160 CHAPTER 5. META-PROGRAMMING ass
Page 162 and 163: 162 CHAPTER 5. META-PROGRAMMING num
Page 164 and 165: 164 CHAPTER 5. META-PROGRAMMING Usi
Page 166 and 167: 166 CHAPTER 5. META-PROGRAMMING } v
Page 168 and 169: 168 CHAPTER 5. META-PROGRAMMING The
Page 170 and 171: 170 CHAPTER 5. META-PROGRAMMING onl
Page 172 and 173: 172 CHAPTER 5. META-PROGRAMMING for
Page 174 and 175: 174 CHAPTER 5. META-PROGRAMMING } u
Page 176 and 177: 176 CHAPTER 5. META-PROGRAMMING } r
Page 180 and 181: 180 CHAPTER 5. META-PROGRAMMING } t
Page 182 and 183: 182 CHAPTER 5. META-PROGRAMMING };
Page 184 and 185: 184 CHAPTER 5. META-PROGRAMMING Com
Page 186 and 187: 186 CHAPTER 5. META-PROGRAMMING tem
Page 188 and 189: 188 CHAPTER 6. INHERITANCE { } std:
Page 190 and 191: 190 CHAPTER 6. INHERITANCE 6.4.1 Ca
Page 192 and 193: 192 CHAPTER 6. INHERITANCE dbp= sta
Page 194 and 195: 194 CHAPTER 6. INHERITANCE Our comp
Page 196 and 197: 196 CHAPTER 6. INHERITANCE Another
Page 198 and 199: 198 CHAPTER 6. INHERITANCE
Page 200 and 201: 200 CHAPTER 7. EFFECTIVE PROGRAMMIN
Page 225 and 226: Finite World of Computers Chapter 8
Page 227 and 228: 8.2. MORE NUMBERS AND BASIC STRUCTU
Page 229 and 230:
8.2. MORE NUMBERS AND BASIC STRUCTU
Page 231 and 232:
8.4. THE OTHER WAY AROUND 231 As ca
Page 233 and 234:
How to Handle Physics on the Comput
Page 235 and 236:
Programming tools Chapter 10 In thi
Page 237 and 238:
10.2. DEBUGGING 237 T& glas::contin
Page 239 and 240:
10.3. VALGRIND 239 Stepi and Nexti
Page 241 and 242:
10.5. UNIX AND LINUX 241 • top: l
Page 243 and 244:
C ++ Libraries for Scientific Compu
Page 245 and 246:
11.3. BOOST.BINDINGS 245 • Math a
Page 247 and 248:
11.3. BOOST.BINDINGS 247 #include
Page 249 and 250:
11.4. MATRIX TEMPLATE LIBRARY 249 c
Page 251 and 252:
11.7. GEOMETRIC LIBRARIES 251 11.7.
Page 253 and 254:
Real-World Programming Chapter 12 1
Page 255 and 256:
12.1. TRANSCENDING LEGACY APPLICATI
Page 257 and 258:
12.1. TRANSCENDING LEGACY APPLICATI
Page 259 and 260:
Parallelism Chapter 13 13.1 Multi-T
Page 261 and 262:
13.2. MESSAGE PASSING 261 int main
Page 263 and 264:
Numerical exercises Chapter 14 In t
Page 265 and 266:
14.1. COMPUTING AN EIGENFUNCTION OF
Page 267 and 268:
Page 269 and 270:
Page 271 and 272:
Page 273 and 274:
14.3. THE SOLUTION OF A SYSTEM OF D
Page 275 and 276:
14.4. GOOGLE’S PAGE RANK 275 Taki
Page 277 and 278:
14.5. THE BISECTION METHOD FOR FIND
Page 279 and 280:
14.6. THE NEWTON-RAPHSON METHOD FOR
Page 281 and 282:
14.7. SEQUENTIAL NOISE REDUCTION OF
Page 283 and 284:
14.7. SEQUENTIAL NOISE REDUCTION OF
Page 285 and 286:
Programmierprojekte Kapitel 15 Die
Page 287 and 288:
15.6. MATRIX-SKALIERUNG 287 Siehe h
Page 289 and 290:
15.10. ANWENDUNG MTL4 AUF TYPEN MIT
Page 291 and 292:
Acknowledgement Chapter 16 Special
Page 293:
Bibliography [AG04] David Abrahams
show all

C++ for Scientists - Technische Universität Dresden

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?