Introduction to the DirectX 9 Shader Models - Nvidia
Introduction to the DirectX 9 Shader Models - Nvidia Introduction to the DirectX 9 Shader Models - Nvidia
Why use fp16 precision? • SPEED – On some HW, using fewer registers leads to faster performance – fp16 takes half the register space of fp32, so can be 2x faster • That said, the first rule of optimization is : DON’T • If your shaders are fast enough at full precision, great – But make sure you test on low-end DirectX9 cards, too • Otherwise, here is how to optimize your shaders for precision
How to use fp16 • In HLSL or Cg, use the half keyword instead of float • Because the spec requires high-precision when mixing high & low precision, you may have to use extra casts or temporaries at times • Assuming s0 maps to a 32bit fp texture : – Original : float3 a = tex2d( s0, tex0 ) – Optimized : half3 a = tex2d( s0, (half3)tex0 )
- Page 21 and 22: HLSL and Conditionals • Initial r
- Page 23 and 24: Original HLSL Compiler Results vs_2
- Page 25 and 26: Begin Sim VS Sim
- Page 27 and 28: Caps for vs_2_x Sim • New D3DVSHA
- Page 29 and 30: Vertex Shader Predication - HLSL Si
- Page 31 and 32: Vertex Shader Predication Details S
- Page 33 and 34: Nested Static Flow Control Sim •
- Page 35 and 36: Dynamic Flow Control - HLSL Sim for
- Page 37 and 38: End Sim VS Sim
- Page 39 and 40: 2.0 Pixel Shader Instruction Set
- Page 41 and 42: Argument Swizzles •.r, .rrrr, .xx
- Page 43 and 44: ps.2.0 Review - Comparison with ps.
- Page 45 and 46: Caps for Pixel Shader 2.x D3DCAPS9
- Page 47 and 48: Pixel Shader 2.x • 512 instructio
- Page 49 and 50: Single Pass Lighting? • Sometimes
- Page 51 and 52: Single-Pass Lighting ? • Detailed
- Page 53 and 54: Single Pass Lighting? • Putting m
- Page 55 and 56: Single Pass Lighting? • It doesn
- Page 57 and 58: Lighting Render Loop • Per Light
- Page 59 and 60: Lighting Render Loop • Per-Object
- Page 61 and 62: Lighting Render Loop Summary • Go
- Page 63 and 64: Predication • Essentially a desti
- Page 65 and 66: Caveats for Conditional Instruction
- Page 67 and 68: Texture fetch with gradients • Gr
- Page 69 and 70: Arbitrary swizzling • Extremely u
- Page 71: Pixel Shader Precision • Low Prec
- Page 75 and 76: Precision Pitfalls • An IEEE-like
- Page 77 and 78: Precision Pitfalls • Most precisi
- Page 79 and 80: Fully Fragment Shading? • Doing t
- Page 81 and 82: Precision Pitfalls • Avoid precis
- Page 83 and 84: Precision Summary • If high-preci
- Page 85 and 86: vs_3_0 • More flow-control • In
- Page 87 and 88: vs_3_0 inputs
- Page 89 and 90: vs_3_0 Output example vs_3_0 dcl_co
- Page 91 and 92: ps_3_0
- Page 93: Lunch Break We will start back up a
Why use fp16 precision?<br />
• SPEED<br />
– On some HW, using fewer registers leads <strong>to</strong> faster<br />
performance<br />
– fp16 takes half <strong>the</strong> register space of fp32, so can be 2x<br />
faster<br />
• That said, <strong>the</strong> first rule of optimization is : DON’T<br />
• If your shaders are fast enough at full precision,<br />
great<br />
– But make sure you test on low-end <strong>DirectX</strong>9 cards, <strong>to</strong>o<br />
• O<strong>the</strong>rwise, here is how <strong>to</strong> optimize your shaders<br />
for precision