12.07.2015 Views

Real-Time GPU Silhouette Refinement using adaptively blended ...

Real-Time GPU Silhouette Refinement using adaptively blended ...

Real-Time GPU Silhouette Refinement using adaptively blended ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

the added complexity balances out the performance of thetwo approaches to some extent.We have tested against two methods of uniform refinement.The first method is to render the entire refinedmesh as a static VBO stored in graphics memory. Therendering of such a mesh is fast, as there is no transferof geometry across the graphics bus. However, the meshis static and the VBO consumes a significant amount ofgraphics memory. The second approach is the method ofBoubekeur and Schlick [3], where each triangle triggersthe rendering of a pre-tessellated patch stored as trianglestrips in a static VBO in graphics memory.Figure 9(b) shows these two methods against our adaptivemethod. It is clear from the graph that <strong>using</strong> staticVBOs is extremely fast and outperforms the other methodsfor meshes up to 20k triangles. At around 80k triangles,the VBO grows too big for graphics memory, andis stored in host memory, with a dramatic drop in performance.The method of [3] has a linear performancedegradation, but the added cost of triggering the renderingof many small VBOs is outperformed by our adaptivemethod at around 1k triangles. The performance of ourmethod also degrades linearly, but at a slower rate thanuniform refinement. Using our method, we are at 24 FPSable to <strong>adaptively</strong> refine meshes up to 60k for dynamicmeshes, and 100k triangles for static meshes, which is significantlybetter than the other methods. The other <strong>GPU</strong>sshow the same performance profile as the 7800 in Figure9(b), just shifted downward as expected by the number ofpipelines and lower clock speed.Finally, to get an idea of the performance impact of variousparts of our algorithm, we ran the same tests withvarious features enabled or disabled. We found that <strong>using</strong>uniformly distributed random refinement level for eachedge (to avoid the silhouetteness test), the performanceis 30–50% faster than uniform refinement. This is as expectedsince the vertex shader is only marginally morecomplex, and the total number of vertices processed is reduced.In a real world scenario, where there is often a highdegree of frame coherency, this can be utilized by not calculatingthe silhouetteness for every frame. Further, if wedisable blending of consecutive refinement levels (whichcan lead to some popping, but no cracking), we removehalf of the texture lookups in the vertex shader for refinedgeometry and gain a 10% performance increase.8 Conclusion and future workWe have proposed a technique for performing adaptiverefinement of triangle meshes <strong>using</strong> graphics hardware,requiring just a small amount of preprocessing, and withno changes to the way the underlying geometry is stored.Our criterion for adaptive refinement is based on improvingthe visual appearance of the silhouettes of the mesh.However, our method is general in the sense that it caneasily be adapted to other refinement criteria, as shown inSection 6.5.We execute the silhouetteness computations on a <strong>GPU</strong>.Our performance analysis shows that our implementation<strong>using</strong> histogram pyramid extraction outperforms other silhouetteextraction algorithms as the mesh size increases.Our technique for adaptive level of detail automaticallyavoids cracking between adjacent patches with arbitraryrefinement levels. Thus, there is no need to “grow” refinementlevels from patch to patch, making sure two adjacentpatches differ only by one level of detail. Ourrendering technique is applicable to dynamic and staticmeshes and creates continuous level of detail for both uniformand adaptive refinement algorithms. It is transparentfor fragment-level techniques such as texturing, advancedlighting calculations, and normal mapping, and the techniquecan be augmented with vertex-level techniques suchas displacement mapping.Our performance analysis shows that our techniquegives interactive frame-rates for meshes with up to 100kDraftDrafttriangles. We believe this makes the method attractivesince it allows complex scenes with a high number ofcoarse meshes to be rendered with smooth silhouettes.The analysis also indicates that the performance of thetechnique is limited by the bandwidth between host andgraphics memory. Since the CPU is available for othercomputations while waiting for results from the <strong>GPU</strong>, thetechnique is particularly suited for CPU-bound applications.This also shows that if one could somehow eliminatethe read-back of silhouetteness and trigger the refinementdirectly on the graphics hardware, the performanceis likely to increase significantly. To our knowledgethere are no such methods <strong>using</strong> current versions ofthe OpenGL and Direct3D APIs. However, consideringthe recent evolution of both APIs, we expect such functionalityin the near future.A major contribution of this work is an extension of12

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!