11.07.2015 Views

Eliminating Texture Waste: Borderless Ptex - NVIDIA Developer Zone

Eliminating Texture Waste: Borderless Ptex - NVIDIA Developer Zone

Eliminating Texture Waste: Borderless Ptex - NVIDIA Developer Zone

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Eliminating</strong> <strong>Texture</strong><strong>Waste</strong>: <strong>Borderless</strong> <strong>Ptex</strong>John McDonald, <strong>NVIDIA</strong>Corporation


Memory ConsumptionModern games consume a lotof memoryThe largest class of memoryusage is texturesBut lots of texture is wasted!<strong>Waste</strong> costs both memory andincreased load timesBack/FrontGbuffer<strong>Texture</strong>sVB/IBSimulation<strong>NVIDIA</strong> Corporation © 2013


<strong>Waste</strong>d?!<strong>Waste</strong>Two sources of texture waste:Unmapped texture storage (major)Duplicated texels to helpalleviate visible seams (minor)This cannot eliminate seams.<strong>Waste</strong><strong>Waste</strong><strong>Waste</strong><strong>Waste</strong>http://www.boogotti.com/root/images/face/dffuse_texture.jpg<strong>NVIDIA</strong> Corporation © 2013


<strong>Waste</strong>d?!Two sources of texture waste:Unmapped texture storage (major)Duplicated texels to helpalleviate visible seams (minor)This cannot eliminate seams.http://www.boogotti.com/root/images/face/dffuse_texture.jpg<strong>NVIDIA</strong> Corporation © 2013


How much waste are we talking?Nearly 60% of memory usage in a modern game* is texture usageAnd up to 30% of that is waste.That’s 18% of your total application footprint.<strong>NVIDIA</strong> Corporation © 2013


Memory <strong>Waste</strong>18% of your memory is useless.18% of your load time is wasted.<strong>NVIDIA</strong> Corporation © 2013


Enter <strong>Ptex</strong> (a quick recap)The soul of <strong>Ptex</strong>:Model with Quads instead of TrianglesYou’re doing this for your next-gen engine anyways, right?Every Quad gets its own entire texture UV-spaceUV orientation is implicit in surfacedefinitionNo explicit UV parameterizationResolution of each face isindependent of neighbors.<strong>NVIDIA</strong> Corporation © 2013


<strong>Ptex</strong> (cont’d)Invented by Brent Burley at Walt Disney Animation StudiosUsed in every animated film at Disney since 20076 features and all shorts, plus everything inproduction now and for the foreseeablefutureUsed on ~100% of surfacesRapid adoption in DCC toolsWidespread usage throughoutthe film industry<strong>NVIDIA</strong> Corporation © 2013


<strong>Ptex</strong> benefitsNo UV unwrapsAllow artists to work at any resolution they wantPerform an offline pass on assets to decide what to ship for eachplatform based on capabilitiesShip a texture pack later for tail revenueReduce your load times. And your memory footprint. Improveyour visual fidelity.Reduce the cost of production’s long pole—art.<strong>NVIDIA</strong> Corporation © 2013


DemoDemo is running on a Titan.Sorry, it’s what we have at the show. I’ve run on 430-680—perf scales linearly with <strong>Texture</strong>/FB.Could run on any Dx11 capable GPU.Could also run on Dx10 capable GPUs with small adaptations.OpenGL 4—no vendor-specific extensions.<strong>NVIDIA</strong> Corporation © 2013


Roadmap: Realtime <strong>Ptex</strong> v1LoadModelDraw TimeBucketandSortReorderIndexBufferGenerateMipmapsPackPatchConstantsRenderFillBordersPreprocessPack<strong>Texture</strong>ArraysRed: Vertex and Index dataGreen: Patch Constant informationBlue: Texel dataOrange: Adjacency information<strong>NVIDIA</strong> Corporation © 2013


Roadmap: Realtime <strong>Ptex</strong> v2LoadModelDraw TimePackPatchConstantsRenderPack<strong>Texture</strong>ArraysPreprocessRed: Vertex and Index dataGreen: Patch Constant informationBlue: Texel dataOrange: Adjacency information<strong>NVIDIA</strong> Corporation © 2013


Realtime <strong>Ptex</strong> v2Instead of copying texels into a border region, just go look atthem.Use clamp to edge (border color), with a border color of (0,0,0,0)This makes those lookups fast.Also lets you know how close to the edge you areWe’ll need to transform our UVs into their UV spaceAnd accumulate the results<strong>Waste</strong> factor? 0*.<strong>NVIDIA</strong> Corporation © 2013


Example ModelVB: …IB:<strong>NVIDIA</strong> Corporation © 2013


Load ModelVertex DataAny geometry arranged as a quad-based meshExample: Wavefront OBJPatch <strong>Texture</strong>Power-of-two texture imagesAdjacency Information4 Neighbors of each quad patchEasily load texture and adjacency with OSS library available fromhttp://ptex.us/<strong>NVIDIA</strong> Corporation © 2013


<strong>Texture</strong> ArraysLike 3D / Volume <strong>Texture</strong>s, except:No filtering between 2D slicesOnly X and Y decrease with mipmap level (Z doesn’t)Z indexed by integer index, not [0,1]API SupportE.g. (0.5, 0.5, 4) would be (0.5, 0.5) from the 5 th sliceDirect3D 10+: <strong>Texture</strong>2DArrayOpenGL 3.0+: GL_TEXTURE_2D_ARRAY<strong>NVIDIA</strong> Corporation © 2013


Arrays of <strong>Texture</strong> ArraysBoth GLSL and HLSL* support arrays of <strong>Texture</strong>Arrays.This allows for stupidly powerful abuse of texturing.<strong>Texture</strong>2DArray albedo[32]; // D3Duniform sampler2DArray albedo[32]; // OpenGL* HLSL support requires a little codegen—but it’s entirely a compile-time exercise,no runtime impact.<strong>NVIDIA</strong> Corporation © 2013


Pack <strong>Texture</strong> ArraysOne <strong>Texture</strong>2DArray per top-mipmap levelStore with complete with mipmap chainDon’t forget to set border color to black (with 0 alpha).<strong>NVIDIA</strong> Corporation © 2013


Packed Arrays<strong>Texture</strong> Array (TA) 0 TA 1 TA 2Slice 0 Slice 1 Slice 2 Slice 0 Slice 0<strong>NVIDIA</strong> Corporation © 2013


Pack Patch ConstantsCreate a constant-buffer indexed byPrimitiveID. Each entry contains:Your Array Index and Slice in the<strong>Texture</strong>2DArraysYour four neighbors across the edgesEach neighbor’s UV orientation(Again, can be prepared at baking time)struct PTexParameters {ushort usNgbrIndex[4];ushort usNgbrXform[4];ushort usTexIndex;ushort usTexSlice;};If rendering too many primitivesto fit into a constant buffer,you can use Structured Buffers / SSBO for storage.uniform ptxDiffuseUBO {PTexParameters ptxDiffuse[PRIMS];};<strong>NVIDIA</strong> Corporation © 2013


Rendering time (CPU)Bind <strong>Texture</strong>2DArrays(If you’re in GL, consider Bindless)Select ShaderSetup Constants<strong>NVIDIA</strong> Corporation © 2013


Rendering Time (DS)In the domain shader, we need to generate our UVs.Use SV_DomainLocation.Exact mapping is dependent onDCC tool used to generatethe mesh<strong>NVIDIA</strong> Corporation © 2013Incorrect surface orientation


Rendering Time (PS)Conceptually, a ptex lookup is:Sample our surface (use SV_PrimitiveID to determine our data).For each neighbor:Transform our UV into their UV spacePerform a lookup in that surface with transformed UVsAccumulate the result, correct for base-level differences and return<strong>NVIDIA</strong> Corporation © 2013


Mapping SpaceThere are 16 cases thatmap our UV space to ourneighbors, as shown.<strong>NVIDIA</strong> Corporation © 2013


Transforming SpaceConveniently these mapto simple 3x2 texturetransforms<strong>NVIDIA</strong> Corporation © 2013


All your baseBase level differences, wah?When a 512x512 neighbors a 256x256, their base levels aredifferent.This is an issue because samples are constant-sized in texel(integer) space, not UV (float) spaceBad seaming<strong>NVIDIA</strong> Corporation © 2013


RenormalizationWith unused alpha channel, code is simply:return result / result.a;If you need alpha, see appendixBad seamingFixed!<strong>NVIDIA</strong> Corporation © 2013


0% <strong>Waste</strong>?Okay, not quite 0.Need a global set of textures that match ptex resolutions used.“Standard Candles”But they are one-channel, and can be massively compressed (4 bitsper pixel)


A brief interlude on the expense of retrievingtexels from textured surfaces<strong>Texture</strong> lookups by themselves are not expensive.There are fundamentally two types of lookups:Independent readsDependent readsIndependent reads can be pipelined.The first lookup “costs” ~150 clocksThe second costs ~5 clocks.Dependent reads must wait for previous resultsThe first lookup costs ~150 clocksThe second costs ~150 clocks.Try to have no more than 2-3 “levels” of dependent reads in a singleshader<strong>NVIDIA</strong> Corporation © 2013


Performance ImpactIn this demo, <strong>Ptex</strong> costs < 30% versus no texturing at allCosts < 20% compared to repeat texturing.~15% versus an UV-unwrapped mesh<strong>NVIDIA</strong> Corporation © 2013


Putting it all togetherUF.(u, v) = ( 0.5, 0.5 )LFRR.(u, v) = ( 0.5, -0.5 )U.(u, v) = ( 0.5, 1.5 )L.(u, v) = ( 1.5, 0.5 )DD.(u, v) = ( 0.5, -0.5 )In this situation, texture lookups in R, U, L and D will return theborder color (0, 0, 0, 0)F lookup will return alpha of 1—so the weight will be exactly 1.<strong>NVIDIA</strong> Corporation © 2013


Putting it all togetherUF.(u, v) = ( 1.0, 0.5 )LFRR.(u, v) = ( 0.5, 0.0 )U.(u, v) = ( 0.0, 1.5 )L.(u, v) = ( 2.0, 0.5 )DD.(u, v) = ( 0.0, -0.5 )In this situation, texture lookups in U, L and D will return the border color(0, 0, 0, 0)If R and F are the same resolution, they will each return an alpha of 0.5.If R and F are not the same resolution, alpha will not be 1.0—renormalizationwill be necessary.<strong>NVIDIA</strong> Corporation © 2013


Questions?jmcdonald at nvidia dot comDemo Thanks: Johnny Costello and Timothy Lottes!<strong>NVIDIA</strong> Corporation © 2013


In the demo<strong>Ptex</strong>AAVignettingLightingSpectral Simulation (7 data points)Volumetric Caustics (128 taps per pixel)<strong>NVIDIA</strong> Corporation © 2013

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!