10.02.2013 Views

Instruction Throughput - Nvidia

Instruction Throughput - Nvidia

Instruction Throughput - Nvidia

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Serialization: Analysis with Modified Code<br />

� Modify kernel code to assess performance improvement if<br />

serialization were removed<br />

© NVIDIA Corporation 2011<br />

� Helps decide whether optimizations are worth pursuing<br />

� Shared memory bank conflicts:<br />

� Change indexing to be either broadcasts or just threadIdx.x<br />

� Should also declare smem variables as volatile<br />

- Prevents compiler from “caching” values in registers<br />

� Warp divergence:<br />

� Change the if-condition to have all threads take the same path<br />

� Time both paths to see what each costs<br />

13

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!