Introduction to CUDA C
Introduction to CUDA C Introduction to CUDA C
Race Conditions � Terminology: A race condition occurs when program behavior depends upon relative timing of two (or more) event sequences � What actually takes place to execute the line in question: *c += sum; — Read value at address c — Add sum to value — Write result to address c � What if two threads are trying to do this at the same time? � Thread 0, Block 0 — Read value at address c — Add sum to value — Write result to address c Terminology: Read-Modify-Write � Thread 0, Block 1 — Read value at address c — Add sum to value — Write result to address c
Global Memory Contention Block 0 sum = 3 *c += sum Block 1 sum = 4 Reads 0 0 Read-Modify-Write Computes 0+3 Writes 3 0+3 = 3 3 c 0 0 3 3 3 7 3 Reads 3 3+4 = 7 7 Computes 3+4 Writes 7 Read-Modify-Write
- Page 3 and 4: What is CUDA? � CUDA Architecture
- Page 5 and 6: CUDA C Prerequisites � You (proba
- Page 7 and 8: Hello, World! int main( void ) { pr
- Page 9 and 10: Hello, World! with Device Code __gl
- Page 11 and 12: A More Complex Example � A simple
- Page 13 and 14: Memory Management � Host and devi
- Page 15 and 16: A More Complex Example: main() int
- Page 17 and 18: Parallel Programming in CUDA C �
- Page 19 and 20: Parallel Programming in CUDA C �
- Page 21 and 22: Parallel Addition: main() #define N
- Page 23 and 24: Review � Difference between “ho
- Page 25 and 26: Threads � Terminology: A block ca
- Page 27 and 28: Parallel Addition (Threads): main()
- Page 29 and 30: Indexing Arrays With Threads And Bl
- Page 31 and 32: Addition with Threads and Blocks
- Page 33 and 34: Parallel Addition (Blocks/Threads):
- Page 35 and 36: Dot Product � Unlike vector addit
- Page 37 and 38: Dot Product � But we need to shar
- Page 39 and 40: Parallel Dot Product: dot() � We
- Page 41 and 42: Faulty Dot Product Exposed! � Ste
- Page 43 and 44: Synchronization � We need threads
- Page 45 and 46: Parallel Dot Product: dot() __globa
- Page 47 and 48: Parallel Dot Product: main() } // c
- Page 49 and 50: Review (cont) � Using __shared__
- Page 51 and 52: Multiblock Dot Product: Algorithm
- Page 53: Multiblock Dot Product: dot() #defi
- Page 57 and 58: Atomic Operations � Terminology:
- Page 59 and 60: Parallel Dot Product: main() #defin
- Page 61 and 62: Review � Race conditions — Beha
- Page 63: Questions � First my questions
Global Memory Contention<br />
Block 0<br />
sum = 3<br />
*c += sum<br />
Block 1<br />
sum = 4<br />
Reads 0<br />
0<br />
Read-Modify-Write<br />
Computes 0+3 Writes 3<br />
0+3 = 3 3<br />
c 0<br />
0 3<br />
3 3<br />
7<br />
3<br />
Reads 3<br />
3+4 = 7 7<br />
Computes 3+4 Writes 7<br />
Read-Modify-Write