Introduction to CUDA C

15.08.2012 Views
Parallel Programming in CUDA C � With add() running in parallel…let’s do vector addition � Terminology: Each parallel invocation of add() referred to as a block � Kernel can refer to its block’s index with the variable blockIdx.x � Each block adds a value from a[] and b[], storing the result in c[]: __global__ void add( int *a, int *b, int *c ) { c[blockIdx.x] = a[blockIdx.x] + b[blockIdx.x]; } � By using blockIdx.x to index arrays, each block handles different indices

Parallel Programming in CUDA C � We write this code: __global__ void add( int *a, int *b, int *c ) { } c[blockIdx.x] = a[blockIdx.x] + b[blockIdx.x]; � This is what runs in parallel on the device: Block 0 c[0] = a[0] + b[0]; Block 2 c[2] = a[2] + b[2]; Block 1 c[1] = a[1] + b[1]; Block 3 c[3] = a[3] + b[3];

Page 1 and 2: Introduction to CUDA C San Jose Con

Page 3 and 4: What is CUDA? � CUDA Architecture

Page 5 and 6: CUDA C Prerequisites � You (proba

Page 7 and 8: Hello, World! int main( void ) { pr

Page 9 and 10: Hello, World! with Device Code __gl

Page 11 and 12: A More Complex Example � A simple

Page 13 and 14: Memory Management � Host and devi

Page 15 and 16: A More Complex Example: main() int

Page 17: Parallel Programming in CUDA C �

Page 21 and 22: Parallel Addition: main() #define N

Page 23 and 24: Review � Difference between “ho

Page 25 and 26: Threads � Terminology: A block ca

Page 27 and 28: Parallel Addition (Threads): main()

Page 29 and 30: Indexing Arrays With Threads And Bl

Page 31 and 32: Addition with Threads and Blocks

Page 33 and 34: Parallel Addition (Blocks/Threads):

Page 35 and 36: Dot Product � Unlike vector addit

Page 37 and 38: Dot Product � But we need to shar

Page 39 and 40: Parallel Dot Product: dot() � We

Page 41 and 42: Faulty Dot Product Exposed! � Ste

Page 43 and 44: Synchronization � We need threads

Page 45 and 46: Parallel Dot Product: dot() __globa

Page 47 and 48: Parallel Dot Product: main() } // c

Page 49 and 50: Review (cont) � Using __shared__

Page 51 and 52: Multiblock Dot Product: Algorithm

Page 53 and 54: Multiblock Dot Product: dot() #defi

Page 55 and 56: Global Memory Contention Block 0 su

Page 57 and 58: Atomic Operations � Terminology:

Page 59 and 60: Parallel Dot Product: main() #defin

Page 61 and 62: Review � Race conditions — Beha

Page 63: Questions � First my questions

device

threads

block

void

memory

cuda

host

parallel

thread

copies

introduction

www.nvidia.com

Introduction to CUDA C

Introduction to CUDA C ... View more Introduction to CUDA C

Delete template?

Save as template ?

Introduction to CUDA C Introduction to CUDA C