10.02.2013 Views

Instruction Throughput - Nvidia

Instruction Throughput - Nvidia

Instruction Throughput - Nvidia

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Serialization: Optimization<br />

� Shared memory bank conflicts:<br />

© NVIDIA Corporation 2011<br />

� Pad SMEM arrays<br />

- For example, when a warp accesses a 2D array‟s column<br />

- See CUDA Best Practices Guide, Transpose SDK whitepaper<br />

� Rearrange data in SMEM<br />

� Warp serialization:<br />

� Try grouping threads that take the same path into same warp<br />

- Rearrange the data, pre-process the data<br />

- Rearrange how threads index data (may affect memory perf)<br />

14

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!