Depth + Texture Representation - International Institute of ...
Depth + Texture Representation - International Institute of ...
Depth + Texture Representation - International Institute of ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>Depth</strong> + <strong>Texture</strong> <strong>Representation</strong>:<br />
Study and Support<br />
in OpenSceneGraph<br />
(Final Year Project Report)<br />
Aditi Goswami<br />
200201058
Abstract<br />
Image Based Rendering holds a lot <strong>of</strong> promise for navigating through a real world<br />
scene without modeling it manually. For IBR, different representations have been<br />
proposed in literature. One <strong>of</strong> the techniques for Image Based Rendering uses depth<br />
maps and texture images from a number <strong>of</strong> viewpoints and is a rich and viable<br />
representation for IBR but is computationally intensive. Our approach using GPU<br />
gives the results in real time with quality output.<br />
We also provide the support for our system on a widely used, open source graphics<br />
API called OpenSceneGraph (osg). osg represents a complex 3D scene as a<br />
hierarchical; object oriented model called Scene graphs. The actual geometry is<br />
stored in leaf nodes <strong>of</strong> the scene graphs. We add our IBR technique (<strong>Depth</strong><br />
+<strong>Texture</strong> <strong>Representation</strong>) to osg as an independent class, which will allow the user<br />
to develop hybrid scenes: the geometry coming partially from actual 3D models and<br />
from our IBR technique.<br />
Keywords: Image based modeling and rendering, Splatting, triangulation,<br />
Blending, Graphics Processing Unit, Vertex Shader, Pixel Shader, Scene Graphs
TABLE OF CONTENTS<br />
TITLE No.<br />
Page No.<br />
1.Introduction 4<br />
1.1 Image Based Rendering 4<br />
1.2 GPU 5<br />
1.3 Cg 7<br />
1.4 Scene Graphs 8<br />
1.5 Related Background 9<br />
2. <strong>Depth</strong> + TR system 11<br />
2.1 <strong>Representation</strong> 11<br />
2.2 Rendering 12<br />
2.2.1 Rendering one D+TR 12<br />
2.2.2 Rendering multiple D+TR 13<br />
2.3 Blending on CPU 13<br />
2.4 GPU Algorithm 15<br />
2.5 Two Pass Algorithm 15<br />
2.5.1 Pass 1 16<br />
2.5.2 Pass 2 16<br />
2.6 Blending on GPU 17<br />
2.7 Pipeline <strong>of</strong> Accelerated D+TR 18<br />
2.8 Some Results 20<br />
3. D+TR & OpenSceneGraph 21<br />
3.1 OpenSceneGraph 21<br />
3.1.1 “What is Scene Graph?” 22<br />
3.1.2 Nodes in OSG 25<br />
3.1.3 Structure <strong>of</strong> Scene Graph 26<br />
3.1.4 Windowing System in OSG 27<br />
3.1.5 Skeleton OSG Code 27<br />
3.1.6 Callbacks 28<br />
3.1.7 osgGA::GUIEventHandler 28<br />
3.2 D+TR in OSG 29<br />
3.2.1 <strong>Representation</strong> 29<br />
3.2.2 Rendering 30<br />
3.2.2.1 Rendering in OSG 31<br />
3.2.3 Discussion 31<br />
3.3 Conclusions and Results 33<br />
References<br />
Appendix
Chapter 1<br />
Introduction<br />
The potential <strong>of</strong> Image Based Modeling and Rendering (IBMR) is to produce new<br />
views <strong>of</strong> a real scene with the realism impossible to achieve by other means, which<br />
makes it very appealing. IBMR aims to capture an environment using a number <strong>of</strong><br />
(carefully placed) cameras. Any view <strong>of</strong> the environment can be generated from<br />
these views subsequently. For Rendering the new views generated using the D+TR<br />
(<strong>Depth</strong> + <strong>Texture</strong> representation) technique various algorithms like splatting and<br />
triangulation are used and further implementing them on GPU gives real time<br />
acceleration to the system.<br />
1.1 Image Based Rendering<br />
Key features <strong>of</strong> any Image based Rendering algorithm is constructing and rendering<br />
models. Construction <strong>of</strong> models is achieved using techniques <strong>of</strong> vision as stereo, 3D<br />
co-ordinates calculation, calibration etc. and rendering is done using graphics<br />
algorithms as splatting, triangulation, interpolation, and algorithms implemented<br />
using GPU. Hence we can describe Image based Rendering as a unique example<br />
where vision and graphics both are implemented hand in hand.<br />
We can classify various IBR techniques into two broad categories. They are shown<br />
in the figure 1 below. The D+TR technique discussed in this report comes under the<br />
‘representation with geometry’. <strong>Representation</strong> without geometry means that some<br />
graphics algorithms as culling etc. are used to render the model whose location in<br />
the world is fixed and no 3D co-ordinates calculation is done. Whereas in a<br />
representation with a geometry involves generating a novel view based on the 3D<br />
co-ordinates in the scene.<br />
IMAGE BASED RENDERING<br />
\\\ WITH GEOMETRY<br />
WITHOUT GEOMETRY<br />
FIGURE 1<br />
Figure 1: Classification <strong>of</strong> Image Based Rendering<br />
Generally, most <strong>of</strong> the IBR techniques being studied and implemented in past suffer<br />
either form quality or speed. Getting both these factors together is still a big<br />
challenge. The quality <strong>of</strong> the novel views being rendered using D+TR technique is<br />
very high, but it lacks the speed to make it real time. Our attempt is to implement<br />
GPU based algorithms to accelerate the current version <strong>of</strong> D+TR. The concept <strong>of</strong><br />
generating the novel view image is same as in original D+TR, but the algorithms
are twisted a bit to suit the limitations and features <strong>of</strong> GPU. All these algorithms<br />
and limitations are discussed in chapters further in the report.<br />
In figure below, the basic structure <strong>of</strong> D+TR is shown and the parts in the structure<br />
where GPU algorithms are implemented are also shown<br />
<strong>Depth</strong> Maps<br />
Calibration<br />
Parameters<br />
Estimation <strong>of</strong> valid<br />
views for this novel<br />
view camera<br />
Novel View<br />
Camera<br />
Rendering Valid views<br />
from novel view.<br />
Storing novel view<br />
depth and texture<br />
from each view<br />
Estimation<br />
<strong>of</strong> 3D<br />
<strong>Texture</strong><br />
Novel View<br />
Blending<br />
GPU is used to implement and accelerate the four<br />
major blocks within dotted lines <strong>of</strong> original D+TR<br />
FIGURE 2: Basic Structure <strong>of</strong> D+TR technique <strong>of</strong> IBR<br />
As shown in figure above, the modules <strong>of</strong> rendering, 3D co-ordinates estimation<br />
and blending are key parts <strong>of</strong> D+TR technique <strong>of</strong> IBR and are costly also, hence<br />
GPU is used to accelerate them. GPU algorithm for all these modules is explained<br />
later.<br />
1.2 GPU (Graphics processing unit)<br />
The power and flexibility <strong>of</strong> GPU’s make them an attractive platform for general<br />
purpose computation. Modern Graphics processing units are deeply programmable<br />
and support high precision. GPU has various advantages and has certain limitations<br />
as. Figure below shows the modern GPU processor pipeline.
GPU’s are also termed as fast co-processor. The speed <strong>of</strong> GPU’s is increasing at<br />
cubed Moore’s law. They allow data processing in parallel. GPU is used in many<br />
applications like simulation <strong>of</strong> physical processes as fluid flow, n-body systems,<br />
molecular dynamics, real time visualization <strong>of</strong> complex phenomena, rendering<br />
complex scenes among others.<br />
The kernel <strong>of</strong> GPU is vertex/fragment program which runs on each element <strong>of</strong> the<br />
input stream, generating an output stream. Input stream is a stream <strong>of</strong> fragments or<br />
vertices or texture data. Output stream is frame buffer or pixel buffer or texture.<br />
Hence GPU is also termed as parallel stream processor.<br />
Graphics State<br />
Application<br />
on<br />
CPU<br />
Vertex<br />
Shader<br />
Rasterizer<br />
Pixel<br />
Shader<br />
Memory<br />
<strong>Texture</strong><br />
Vertices (3D) Vertices (2D) Render to<br />
FIGURE 3: Modern GPU Processor’s Pipeline<br />
Vertex Shader: A vertex Shader allows custom processing <strong>of</strong> per vertex information<br />
on the GPU. The position, color and other related data can be modified by the<br />
vertex shader. Each vertex is processed in a sequence <strong>of</strong> steps before geometry are<br />
rasterized. The same program is executed for every vertex. There is no<br />
interdependence between the vertices, hence shaders can be implemented in<br />
parallel. The shader cannot delete/add a vertex. Vertex shaders are used in giving<br />
unique graphical effects like deformation <strong>of</strong> mesh, procedural geometry, blending,<br />
texture generation, interpolation, displacement mapping, and motion blur among<br />
others.<br />
Pixel Shader: It deals with programming the pixel. It allows custom processing <strong>of</strong><br />
fragment information on the GPU. The color <strong>of</strong> each fragment can be computed<br />
rather than simply being interpolated by the rasterizer. Fragment or Pixel shaders
are responsible for texturing, shading and blending. They can also be implemented<br />
in parallel in graphics hardware. The color <strong>of</strong> each component is individually<br />
computed rather than just been computed by the rasterizer. Pixel Shaders have<br />
limited or no knowledge <strong>of</strong> neighboring pixels. Output <strong>of</strong> the fragment shader can<br />
be color or depth.<br />
Limitations <strong>of</strong> GPU: Besides providing such a powerful approach towards<br />
computation, GPU’s also laid certain constraints on its use. Graphics Processor<br />
avoids read backs. It does not allow any arbitrary write to any location. Each<br />
operation is performed on the chunk <strong>of</strong> data. Code written on CPU cannot be simply<br />
ported on GPU, it requires some initialization and binding procedure.<br />
1.3 Cg<br />
Historically, graphics hardware has been programmed at a very low level. Fixedfunction<br />
pipelines were configured by setting states such as the texture-combining<br />
modes. More recently, programmers configured programmable pipelines by using<br />
programming interfaces at the assembly language level. In theory, these low-level<br />
programming interfaces provided great flexibility. In practice, they were painful to<br />
use and presented a serious barrier to the effective use <strong>of</strong> hardware.<br />
Using a high-level programming language, rather than the low-level languages <strong>of</strong><br />
the past, provides several advantages. The compiler optimizes code automatically<br />
and performs low-level tasks, such as register allocation, that are tedious and prone<br />
to error. Shading code written in a high-level language is much easier to read and<br />
understand. It also allows new shaders to be easily created by modifying previously<br />
written shaders. Shaders written in a high-level language are portable to a wider<br />
range <strong>of</strong> hardware platforms than shaders written in assembly code.<br />
This chapter introduces Cg, a new high-level language tailored for programming<br />
GPUs. Cg <strong>of</strong>fers all the advantages just described, allowing programmers to finally<br />
combine the inherent power <strong>of</strong> the GPU with a language that makes GPU<br />
programming easy. Cg is very similar to C that is why it is called C for graphics<br />
(Cg). It is high level shading language. Cg is easy to use with OpenGL. It has<br />
powerful swizzle operator and build in vector and matrix type, supports basic types,<br />
structure, array, type conversion as in C, support large number <strong>of</strong> mathematical,<br />
derivative and geometric functions as well.<br />
The Cg language allows you to write programs for both the vertex processor and the<br />
fragment processor. We refer to these programs as vertex programs and fragment<br />
programs, respectively. (Fragment programs are also known as pixel programs or<br />
pixel shaders, and we use these terms interchangeably in this document.) Cg code<br />
can be compiled into GPU assembly code, either on demand at run time or<br />
beforehand. Cg makes it easy to combine a Cg fragment program with a<br />
handwritten vertex program, or even with the non-programmable OpenGL or<br />
DirectX vertex pipeline. Likewise, a Cg vertex program can be combined with a
handwritten fragment program, or with the non-programmable OpenGL or DirectX<br />
fragment pipeline.<br />
FIGURE 4: Pipeline <strong>of</strong> Cg Code and Compiler<br />
1.4 Scene Graph<br />
A Scene Graph is a data structure which captures the logical and spatial<br />
representation <strong>of</strong> a graphical scene. A Scene Graph consists <strong>of</strong> various kinds <strong>of</strong><br />
nodes, each representing a particular type <strong>of</strong> information like transformations or<br />
lighting.<br />
Open Inventor, OpenGL Performer, OpenSceneGraph, Java 3D etc are some <strong>of</strong> the<br />
common APIs which use a Scene Graph. The key reasons that many graphics<br />
developers uses scene graphs are Performance, Productivity, Portability and<br />
Scalability:<br />
1)Performance<br />
Scene graphs provide an excellent framework for maximizing graphics<br />
performance. A good scene graph employs two key techniques - culling <strong>of</strong> the<br />
objects that won't be seen on screen, and state sorting <strong>of</strong> properties such as textures<br />
and materials, so that all similar objects are drawn together. The hierarchical<br />
structure <strong>of</strong> the scene graph makes this culling process very efficient, for instance a<br />
whole city can be culled with just a few operations.<br />
2)Productivity<br />
Scene graphs take away much <strong>of</strong> the hard work required to develop high<br />
performance graphics applications. Furthermore, one <strong>of</strong> most powerful concepts in<br />
Object Oriented programming is that <strong>of</strong> object composition, enshrined in the<br />
Composite Design Pattern, which fits the scene graph tree structure perfectly and<br />
makes it a highly flexible and reusable design.
Scene graphs also <strong>of</strong>ten come with additional utility libraries which range from<br />
helping users set up and manage graphics windows to importing <strong>of</strong> 3d models and<br />
images. A dozen lines <strong>of</strong> code can be enough to load our data and create an<br />
interactive viewer<br />
3)Portability<br />
Scene graphs encapsulate much <strong>of</strong> the lower level tasks <strong>of</strong> rendering graphics and<br />
reading and writing data, reducing or even eradicating the platform specific coding<br />
that you require in your own application. If the underlying scene graph is portable<br />
then moving from platform to platform can be as simple as recompiling your source<br />
code.<br />
4)Scalability<br />
Along with being able to dynamic manage the complexity <strong>of</strong> scenes automatically<br />
to account for differences in graphics performance across a range <strong>of</strong> machines,<br />
scene graphs also make it much easier to manage complex hardware configurations,<br />
such as clusters <strong>of</strong> graphics machines, or multiprocessor/multipipe systems. A good<br />
scene graph will allow the developer to concentrate on developing their own<br />
application while the rendering framework <strong>of</strong> the scene graph handles the different<br />
underlying hardware configurations.<br />
1.5 Related Background<br />
Image Based Rendering (IBR) has the potential to produce new views <strong>of</strong> a real<br />
scene with the realism impossible to achieve by other means. It aims to capture an<br />
environment using a number <strong>of</strong> cameras that recover the geometric and photometric<br />
structure from the scenes. The scene can be rendered from any viewpoint thereafter<br />
using the internal representations used. The representations used fall into two broad<br />
categories: those without any geometric model and those with geometric model <strong>of</strong><br />
some kind. Early IBR efforts produced new views <strong>of</strong> scenes given two or more<br />
images <strong>of</strong> it [5, 19]. Point-to-point correspondences contained all the structural<br />
information about the scene used by such methods. Many later techniques also used<br />
only the images for novel view generation[15, 11, 8, 20]. They require a large<br />
number <strong>of</strong> input views -- <strong>of</strong>ten running into thousands -- for modeling a scene<br />
satisfactorily. This makes them practically unusable other than for static scenes.<br />
The representation was also bulky and needs sophisticated compression schemes.<br />
The availability <strong>of</strong> even approximate geometry can reduce the requirements on the<br />
number <strong>of</strong> views drastically. The use <strong>of</strong> approximate geometry for view generation<br />
was a significant contribution <strong>of</strong> Lumigraph rendering [8] and in view-dependent<br />
texture mapping[6]. Unstructured Lumigraph[4] extend this idea to rendering using<br />
an unstructured collection <strong>of</strong> views and approximate models.<br />
The <strong>Depth</strong> Image (DI) representation is suitable for IBR as it can be computed<br />
from real world using cameras and can be used for new view generation. A <strong>Depth</strong><br />
Image consists <strong>of</strong> a pair <strong>of</strong> aligned maps: the image or texture map I that gives the<br />
colour <strong>of</strong> all visible points and a depth map D that gives the distance to each visible
point. The image and depth are computed with respect to a real camera in practice<br />
though this does not have to be the case. The calibration matrix C <strong>of</strong> the camera is<br />
also included in the representation giving the triplet (D, I, C). This is a popular<br />
representation for image-based modeling as cameras are cheap and methods like<br />
shape-from-X are mature enough to capture dense depth information. It has been<br />
used in different contexts [14, 16, 23, 13, 24]. The Virtualized Reality system<br />
captured dynamic scenes and modeled them for subsequent rendering using a studio<br />
with a few dozens <strong>of</strong> cameras [16]. Many similar systems have been built in recent<br />
years for modeling, immersion, videoconferencing, etc [22, 3]. Recently, a layered<br />
representation with full geometry recovery for modeling and rendering dynamic<br />
scenes has been reported by Zitnick et al [23]. Special scanners such as those made<br />
by CyberWare have also been used to capture such representations <strong>of</strong> objects and<br />
cultural assets like in the Digital Michelangelo project [2, 1].<br />
<strong>Depth</strong> Images have been used for IBR in the past. McMillan used it for warping<br />
[14] and Mark used an on-the-fly <strong>Depth</strong> Image for fast rendering <strong>of</strong> subsequent<br />
frames [13]. Virtualized Reality project computed them using multibaseline stereo<br />
and used them to render new views using warping and hole-filling [16]. Zitnick et al<br />
[24] use them in a similar way with an additional blending step to smooth<br />
discontinuities. Waschbusch et al [24] extended this representation to sparsely<br />
placed cameras and presented probabilistic rendering with view-independent pointbased<br />
representation <strong>of</strong> the depth information.<br />
The general framework <strong>of</strong> rendering <strong>Depth</strong> Images with blending was presented in<br />
[17]. [25] Provides a GPU based algorithm for real-time rendering <strong>of</strong> a<br />
representation consisting <strong>of</strong> multiple <strong>Depth</strong> Images. It present a study on the<br />
locality properties <strong>of</strong> <strong>Depth</strong> Image based rendering. Results on representative<br />
synthetic data sets are presented to demonstrate the utility <strong>of</strong> the representation and<br />
the effectiveness <strong>of</strong> the algorithms presented.
Chapter 2<br />
<strong>Depth</strong> +TR System: <strong>Representation</strong> and Rendering<br />
2.1 <strong>Representation</strong><br />
Basic representation consists <strong>of</strong> an image, its depth map and the calibration<br />
parameters. <strong>Depth</strong> map is a two dimensional array <strong>of</strong> real values with location (i,j)<br />
storing the depth distance to the point that projects pixel (i,j) in the image. Figure 5<br />
below shows the depth maps and images for synthetic and rea; scene. Closer points<br />
are shown brighter.<br />
(a) Synthetic scene<br />
Figure 5<br />
(b) Real Scene<br />
<strong>Depth</strong> and <strong>Texture</strong> are stored as images in disk. The depth map contains real values<br />
whose range depends upon the resolution <strong>of</strong> the structure recovery method. Images<br />
with 16 bits per pixel can store information up to 65 meters. Further sections<br />
explain how the maps are constructed and why we are using <strong>Depth</strong> + <strong>Texture</strong> for<br />
IBR.<br />
Construction <strong>of</strong> D+TR: The D+TR can be created using a suitable 3D structure<br />
recovery method, including stereo, range sensors, shape-from-shading, etc.<br />
Multicamera stereo remains the most viable option as cameras are inexpensive and<br />
nonintrusive. <strong>Depth</strong> and texture needs to be captured only from a few points <strong>of</strong> view<br />
since geometry can be interpolated. Calibrated, instrumented setup consisting <strong>of</strong> a<br />
dozen or so cameras can capture static or dynamic events as they happen. <strong>Depth</strong><br />
map can be computed for each camera using other cameras in its neighborhood and<br />
a suitable stereo program.
There are various reasons for selecting <strong>Depth</strong> and <strong>Texture</strong> for IBR. Rendering a<br />
scene using depth maps is a research area. <strong>Depth</strong> map gave visibility limited model<br />
<strong>of</strong> the scene and can be rendered easily using graphics algorithms. <strong>Texture</strong> mapping<br />
ensures photorealism.<br />
2.2 Rendering<br />
The rendering aspect <strong>of</strong> the D+TR system is detailed in [17]. We describe in this<br />
section the rendering approaches and related issue here. First, we describe rendering<br />
using one depth Map and then using multiple depth Maps.<br />
2.2.1 Rendering one D+TR<br />
For rendering one D+TR two approaches have been implemented and tested.<br />
Splatting: The point cloud can be splatted as point-features. Splatting techniques<br />
broaden the individual 3D points to fill the space between points. The color <strong>of</strong> the<br />
splatted points is obtained from the corresponding image pixel. Splatting has been<br />
used as the method for fast rendering, as point features are quick to render. The<br />
disadvantage <strong>of</strong> splatting is that holes can show up where data is missing if we<br />
zoom in much. Figure below shows one D+TR rendered using Splatting.<br />
Implied Triangulation: Every 2x2 section <strong>of</strong> the <strong>Depth</strong> map is converted to two<br />
triangles by drawing the diagonal. The depth discontinuities are handled by<br />
breaking all edges with large difference in the z-coordinate between its end points<br />
and removing the corresponding triangles from the model. Triangulation results in<br />
the interpolation <strong>of</strong> the interior points <strong>of</strong> the triangles, filling holes created due to<br />
the lack <strong>of</strong> resolution. The interpolation can produce low quality images if there is<br />
considerable gap in the resolutions <strong>of</strong> the captured and rendered views, such as<br />
when zooming in. This is a fundamental problem in image based rendering. Figure<br />
6 shows the D+TR rendered using the method <strong>of</strong> Triangulation.<br />
(a) D+TR rendered using splatting<br />
FIGURE 6<br />
(b) D+TR rendered using triangulation
2.2.2 Rendering Multiple D+TR<br />
The generated view using one D+TR will have holes or gaps corresponding to the<br />
part <strong>of</strong> the occluded scene being exposed in the new view position. These holes can<br />
be filled using another D+TR that sees those regions. Parts <strong>of</strong> the scene could be<br />
visible to multiple cameras. The views generated by multiple D+TRs have to be<br />
blended in such cases. Hole Filling and Blending <strong>of</strong> the views to generate the novel<br />
view is explained in next section. Figure 7 shows the result <strong>of</strong> rendering multiple<br />
D+TRs<br />
(a) D+TR rendered using splatting<br />
FIGURE 7<br />
(b) D+TR rendered using triangulation<br />
2.3 Blending on CPU<br />
The views used for novel view generation are blended to provide high<br />
quality output. Each valid input view is blended based on its depth and texture.<br />
Blend Function: Angular distance between the input views camera center and the<br />
novel view camera enter is used as the criteria for blending. This angular distance is<br />
used to calculate the weight given to each view for each pixel. Cosine weights are<br />
used in D+TR technique. Figure 8 below is used to explain the blending process.<br />
FIGURE 8: Blending is based on Angular distance
As shown in the figure, t 1 and t 2 are angular distance <strong>of</strong> the views c 1 and c 2 from D<br />
(novel view). So the corresponding pixel weight for both the views is given by<br />
following function.<br />
w c1 = cos n (t 1 ); (for view c 1 )<br />
w c2 = cos n (t 1 ); (for view c 1 )<br />
n > 2 gives suitable values for blending.<br />
Pixel weight can also be calculated using the exponential function. Exponential<br />
blending computes weights as w i = e cti where is the view index, t i is the angular<br />
distance <strong>of</strong> the view i, and w i is the weight for the view at that pixel. The constant c<br />
controls the fall <strong>of</strong>f as the angular distance increases.<br />
Pixel Blending: The blend function should result in smooth changes in the<br />
generated view as the viewpoint changes. Thus, the views that are close to the new<br />
view should get emphasized and views that are away from it should be<br />
deemphasized. Besides this z-buffering is also used so that the proper pixel are used<br />
to render the novel view. An example is explained below using figure 9.<br />
FIGURE 9: Process <strong>of</strong> Pixel Blending when one pixel is seen by multiple views<br />
As shown in the figure above view 1,2,3,4 are used in generating the novel view.<br />
Consider the ray from novel view which in the scene corresponds to both the pixel<br />
A and B. To render this pixel in the novel view the D+TR takes care <strong>of</strong> all the<br />
complexities, it is explained below.<br />
View 1 and 2 are used for pixel A although view 3 and 4 are also valid views to be<br />
used for rendering. This is simply because the pixel A seen by view 3 and 4 lies<br />
behind the pixel B seen by view 1 and 2 along the same ray from novel view. The<br />
red line shows that this view is discarded and green line indicates that the view is<br />
considered for blending.
View 3 sees both the pixel A and B in the scene which are along the ray from novel<br />
view, but the pixel in view 3 corresponding to pixel B is used in rendering the pixel<br />
in novel view.<br />
2.4 GPU Algorithm<br />
We devised a 2-pass algorithm to render multiple DIs with per-pixel blending. The<br />
first pass determines for each pixel which views need to be blended. The second<br />
pass actually blends them. The property <strong>of</strong> each pixel blending a different set <strong>of</strong> DIs<br />
is maintained by the new algorithm. The pseudo code <strong>of</strong> the algorithm, as presented<br />
in [25] is given below:<br />
2.5 Two Pass Algorithm<br />
The D+TR technique <strong>of</strong> IBR has the blending module to provide high<br />
quality novel views. The read back <strong>of</strong> the frame buffer is the time consuming<br />
operation in the above algorithm. The modern GPUs have a lot <strong>of</strong> computation<br />
power and memory in them. If the read back is avoided and the blending is done in<br />
the GPU, the frame rate can possibly reach interactive rates. The much required<br />
speed with the same original quality is achieved with the 2 pass algorithm. Pass 1
deal with vertex shader and pass 2 deals with pixel shader implementing the<br />
blending process.<br />
2.5.1 Pass One<br />
This pass generates the novel view depth. Each pixel is rendered with a shift <strong>of</strong><br />
‘Delta’ in the depth value. If ‘D’ is the depth for a pixel then it is rendered at ‘D-<br />
Delta’ In pass 2 all the pixels within the range <strong>of</strong> Delta are blended and other are<br />
rendered using the depth test. Pass 1 can be explained with the block diagram<br />
below.<br />
calibration parameters<br />
rendering<br />
3D co-ordinates are<br />
computed<br />
x,y,z<br />
Z -> Z - Delta<br />
Novel View <strong>Depth</strong><br />
depth map<br />
FIGURE 10: Process <strong>of</strong> Pass 1 and Rendered surface with pass 1<br />
As shown in the block diagram calibration parameters and the <strong>Depth</strong> values are passed to<br />
the vertex shader. Also, the rendered surface is little behind the actual surface. Each valid<br />
input view for the current novel view position is rendered one by one to get the novel<br />
view depth.<br />
2.5.2 Pass Two<br />
Blending is done in pixel shader. The process <strong>of</strong> blending is explained in next<br />
section. Pixel shader takes camera center <strong>of</strong> the novel view and the current view along<br />
with the textures and 3D location <strong>of</strong> the current pixel to implement this pass. The output<br />
for each view from the pixel shader is sent with next view to the pixel shader to perform<br />
the blending in a normalized way, which also increases the quality <strong>of</strong> output. Pass 2 is<br />
also explained using block diagram in figure 11. Pixel shader code is given at the end <strong>of</strong><br />
this report in the Appendix section along with vertex shader code.
<strong>Texture</strong><br />
output <strong>of</strong> the pixel shader<br />
is copied back to original<br />
texture<br />
Input View<br />
<strong>Texture</strong><br />
Pixel Shader<br />
for last input view<br />
output is sent to<br />
frame buffer<br />
Frame Buffer<br />
each valid input view<br />
texture is sent one by one<br />
FIGURE 11: Implementation <strong>of</strong> Pass 2<br />
Pass 2 is more time taking than pass 1 because the blending process is implemented<br />
in this pass and the weight calculation per pixel is done in pixel shader only. The<br />
number <strong>of</strong> parameters being passed to the pixel shader is also more which further<br />
increases the execution time. Pixel shader takes two textures, two camera centers<br />
(novel view and the input view camera) along with the 3D location <strong>of</strong> the pixel as<br />
input, whereas vertex shader just takes the 3D location and modelview and<br />
projection matrix as the input.<br />
2.6 Blending on GPU<br />
Concept <strong>of</strong> Blending is same on GPU as in CPU, but here the normalization is done<br />
with each incoming input view and in CPU before the blending process starts the<br />
normalized weights are there for each view.<br />
Blend Function: Blend function used is<br />
C f = (a s C s + a d C d )/ a f<br />
a f = a s + a d<br />
Where, C s is the color <strong>of</strong> source pixel (incoming view) and C d is the color <strong>of</strong> the<br />
destination pixel (rendered views before this). a s and a d are corresponding alpha<br />
values. With each incoming view the alpha value <strong>of</strong> the rendered view is<br />
normalized.<br />
C f and a f is the color and alpha value which is rendered and if the input view is not<br />
the last view, then both <strong>of</strong> them come as C d and a d along with the next valid input<br />
view. If the current input view is the last view, then the rendered value <strong>of</strong> C f and a f<br />
is the color and alpha value <strong>of</strong> this location <strong>of</strong> the pixel in the finally rendered novel<br />
view. Both the source and destination alpha values are added so that in division in<br />
the next pass with another input view, the values are normalized.
The alpha value <strong>of</strong> the source pixel is calculated by the pixel shader which becomes<br />
its weight. Using the block diagram below the process <strong>of</strong> calculation <strong>of</strong> alpha value<br />
is explained.<br />
Center <strong>of</strong> Novel View<br />
Center <strong>of</strong> Current View<br />
3D location <strong>of</strong> Pixel<br />
Computation<br />
<strong>of</strong> angle and<br />
weight in pixel<br />
shader<br />
a s<br />
FIGURE 12: Process <strong>of</strong> calculation <strong>of</strong> alpha value <strong>of</strong> source pixel in pixel shader<br />
Each pixel <strong>of</strong> the neighboring valid views used for novel view generation is used in<br />
the process <strong>of</strong> blending, hence weight for each one <strong>of</strong> them is calculated. This<br />
accumulated alpha value is normalized with each new view coming to pixel shader<br />
for blending by dividing by total accumulated alpha value.<br />
2.7 Pipeline <strong>of</strong> Accelerated D+TR<br />
The entire pipeline <strong>of</strong> the Accelerated D+TR is discussed and is explained in this<br />
section. From the figure 13, we can see that the input to the system is <strong>Depth</strong> map,<br />
calibration parameters and the input view texture. <strong>Depth</strong> map is input in the preprocessing<br />
step and pass 1. Calibration parameters are input in pre-processing step<br />
and second pass. Input view texture is the input in pass 2 only when the process <strong>of</strong><br />
blending is implemented.
FIGURE 13: Pipeline <strong>of</strong> accelerated D+TR<br />
From the figure above it is clearly explained that pre-processing step is used to<br />
generate the 3D location <strong>of</strong> pixels, pass 1 for shifting the depth for the novel view<br />
for each valid input view, and pass 2 for blending the pixels based on the depth<br />
rendered and valid input view texture.<br />
The input to the system is calibration parameters, depth maps, and texture images<br />
from certain number <strong>of</strong> views. Based on the novel view camera, firstly the valid<br />
views surrounding the novel view camera are estimated. These valid views are then<br />
used for generating the novel view. To determine the validity <strong>of</strong> the view, the<br />
average <strong>of</strong> the 3D coordinates <strong>of</strong> the scene is taken (which is origin (0, 0, 0) in our<br />
case) and then based on angles between the two cameras and the average center<br />
point, validity is estimated. Views beyond 90 degrees are not considered (as they<br />
will see other part <strong>of</strong> the scene). When less number <strong>of</strong> views is used then closest<br />
views are preferred.<br />
Once the valid views are known, then using the calibration parameter and the <strong>Depth</strong><br />
map <strong>of</strong> each valid view, its 3D location is generated and is passed as texture to the<br />
vertex shader (or is used by vertex shader somehow) for generation <strong>of</strong> novel view<br />
depth. Vertex shader simply shifts the z coordinate a little behind so that the range<br />
within which shift is there, pixels can be blended in second pass. In certain<br />
approaches vertex shader also performs the 3D coordinate estimation, but it not the
optimal solution. On generating the shifted coordinate system, pass 1 passes this<br />
information to pass2.<br />
Pass 2 performs on the fly blending. This is performed by using a pixel shader that<br />
runs on GPU. For each pixel, the shader has access to the novel view and DI<br />
parameters and the results <strong>of</strong> previous rendering using a Frame Buffer Object<br />
(FBO). Depending on which DIs had values near the minimum z for each pixel, a<br />
different combination <strong>of</strong> DIs can be blended at each pixel. The colour values and<br />
alpha values are kept correct always so there is no post-processing step that depends<br />
on the number <strong>of</strong> DIs blended. The algorithm also ensures there will be no<br />
exceeding <strong>of</strong> the maximum range <strong>of</strong> colour values that is a possibility if the<br />
summing is done in the loop followed by a division at the end.<br />
2.8 Results<br />
The FPS <strong>of</strong> the system was calculated by varying the resolution and number <strong>of</strong><br />
input DIs. Below are the tables for the same. This experiment was conducted on an<br />
AMD64 Processor with 1GB RAM and nVidia 6600GT graphics card with 128MB<br />
RAM.<br />
Number <strong>of</strong> Input Views= 18 Number <strong>of</strong> Input Views= 9<br />
Number Of FPS<br />
views (Resolution<br />
=2)<br />
2 to 3 75 21<br />
3 to 5 46 13<br />
4 to 6 38 10.5<br />
7 to 8 21.3 6<br />
8 to 9 20 5<br />
10 to 12 14 3.2<br />
13 to 14 11 2.7<br />
FPS<br />
(Resolution<br />
=1)<br />
Number Of<br />
views<br />
FPS<br />
(Resolution<br />
=2)<br />
2 to 3 210 78<br />
3 to 5 130 46.5<br />
4 to 6 109 40<br />
7 to 8 40 14<br />
8 to 9 38 13.3<br />
9 32 13<br />
FPS<br />
(Resolution<br />
=1)<br />
As clear from the table above, our system is capable <strong>of</strong> producing novel view in real<br />
time.
Chapter 3<br />
D+TR & OpenSceneGraph<br />
3.1 OpenSceneGraph<br />
Open Scene Graph also commonly known as OSG is an Open Source, cross<br />
platform graphics toolkit for the development <strong>of</strong> high performance graphics<br />
applications such as flight simulators, games, virtual reality and scientific<br />
visualization. OSG is based around the concept <strong>of</strong> a Scene-Graph, it provides an<br />
object oriented framework on top <strong>of</strong> OpenGL freeing the developer from<br />
implementing and optimizing low level graphics calls, and provides many<br />
additional utilities for rapid development <strong>of</strong> graphics applications.<br />
It is a 3D graphics library for C++ programmers. A SceneGraph library allows us<br />
to represent objects in a scene with a graph data structure which allows us to group<br />
related objects that share some properties together so we can specify common<br />
properties for the whole group in one place. OSG can then be used to automatically<br />
manage things like the level <strong>of</strong> detail, culling, bounding shapes etc necessary to<br />
draw the scene faithfully but without unnecessary detail (which slows down the<br />
graphics hardware drawing the scene).<br />
The OpenSceneGraph project was started in 1998 by Don Burns as means <strong>of</strong><br />
porting a hang gliding simulator written on top <strong>of</strong> the Performer scene. The source<br />
code was open sourced in 1999 and porting <strong>of</strong> the scene graph element to windows<br />
was carried on by Robert Osfield. The project was made scalable in 2003 and the<br />
Version 1.0 OpenSceneGraph 1.0 release (which is the culmination <strong>of</strong> 6 years work<br />
by the lead developers and the open-source community that has grown up around<br />
the project) happened in 2006. The OSG we know now, is a cross platform,<br />
scalable, real time, open source scene graph that has over 1000 active developers<br />
world wide and users such as NASA, European Space Agency, Boeing, Magic<br />
Earth, American Army, and many others. Enabling the rapid development <strong>of</strong><br />
custom visualization programs, the OSG is also the power behind various projects<br />
like OsgVolume, Present3D etc.<br />
Unfortunately no there is currently no real Reference manuals or Programmers<br />
guides for Open Scene Graph. The recommendations on the OSG web site it to<br />
"Use the Source". Having the source assumes you readily understand how it all<br />
works or that you can deduce this from the code; this is not true for many who are<br />
new to OSG or to the simulation world and can be seen by many <strong>of</strong> the question on<br />
the mailing lists. While OSG has documents generated from headers and source, a<br />
lot <strong>of</strong> material found there is does not have context and can thus be difficult to<br />
assimilate, again a good programmers guide and reference manual can give the<br />
require context
3.1.1 “What is a scene graph?”<br />
As the name suggests, a scene graph is data structure used to organize a<br />
scene in a Computer Graphics application. The basic idea behind a scene graph is<br />
that a scene is usually decomposed in several different parts, and somehow these<br />
parts have to be tied together. So, a scene graph is a graph where every node<br />
represents one <strong>of</strong> the parts in which a scene can be divided. Being a little more<br />
strict, a scene graph is a directed acyclic graph, so it establishes a hierarchical<br />
relationship among the nodes.<br />
In this section, we describe a simple scene graph and introduce some basic OSG<br />
node types.Suppose we want to render a scene consisting <strong>of</strong> a road and a truck.<br />
A scene graph representing this scene is depicted in Figure 14<br />
Figure 14: A scene graph, consisting <strong>of</strong> a road and a truck.<br />
If we render this scene just like this, the truck will not appear on the place you want.<br />
We’ll have to translate it to its right position. Fortunately, scene graph nodes don’t<br />
always represent geometry. In this case, we can add a node representing a<br />
translation, yielding the scene graph shown on Figure 15.<br />
Figure 15: A scene graph, consisting <strong>of</strong> a road and a translated truck.<br />
Let’s add two boxes to the scene, one on the truck, the other one on the road. Both<br />
boxes will have translation nodes above them, so that they can be placed at their<br />
proper locations. Furthermore, the box on the truck will also be translated by the<br />
truck translation, so that if we move the truck, the box will move, too since both
oxes look exactly the same, we don’t have to create a node for each one <strong>of</strong> them.<br />
One node “referenced” twice does the trick, as Figure 16 illustrates. During<br />
rendering, the “Box” node will be visited (and rendered) twice, but some memory is<br />
spared because the model is loaded just once. This is one <strong>of</strong> the reason a scene<br />
graph is a “Graph” and not a “tree”.<br />
Figure 16 : A scene graph, consisting <strong>of</strong> a road, a truck and a pair <strong>of</strong> boxes.<br />
Up to this point, the discussion was around “generic” scene graphs. From now on,<br />
we will use exclusively OSG scene graphs, that is, instead <strong>of</strong> using a generic<br />
“Translation” node, we’ll be using an instance <strong>of</strong> a real class defined in the OSG<br />
hierarchy.<br />
A node in OSG is represented by the osg::Node class. Renderable things in OSG are<br />
represented by instances <strong>of</strong> the osg::Drawable class. But osg::Drawables are not<br />
nodes, so we cannot attach them directly to a scene graph. It is necessary to use a<br />
“geometry node”, osg::Geode, instead. Not every node in an OSG scene graph can<br />
have other nodes attached to them as children. In fact, we can only add children to<br />
nodes that are instances <strong>of</strong> osg::Group or one <strong>of</strong> its subclasses.<br />
Using osg::Geodes and an osg::Group, it is possible to recreate the scene graph<br />
from Figure 14 using real classes from OSG. The result is shown in Figure17.
Figure 17: An OSG scene graph, consisting <strong>of</strong> a road and a truck. Instances <strong>of</strong> OSG<br />
classes derived from osg::Node are drawn in rounded boxes with the class name<br />
inside it. osg::Drawables are represented as rectangles.<br />
That’s not the only way to translate the scene graph from Figure 14 to a real OSG<br />
scene graph. More than one osg::Drawable can be attached to a single osg::Geode,<br />
so that the scene graph depicted in Figure 18 is also an OSG version <strong>of</strong> Figure 14.<br />
Figure 18: An alternative OSG scene graph representing the same scene as the one<br />
in Figure 17.<br />
The scene graphs <strong>of</strong> Figures 17 and 18 has the same problem as the one in the<br />
Figure 14: the truck will probably be at the wrong position. And the solution is the<br />
same as before: translating the truck. In OSG, probably the simplest way to translate<br />
a node is by adding an osg::PositionAttitudeTransform node above it. An<br />
osg::PositionAttitudeTransform has associated to it not only a translation, but also<br />
an attitude and a scale. Although not exactly the same thing, this can be thought as<br />
the OSG equivalent to the OpenGL calls glTranslate(), glRotate() and glScale().<br />
Figure 19 is the OSGfied version <strong>of</strong> Figure 15.<br />
Figure 19: An OSG scene graph, consisting <strong>of</strong> a road and a translated truck. For<br />
compactness reasons, osg::PositionAttitudeTransform is written as osg::PAT.
3.1.2 Nodes in OSG<br />
As stated in previous section, OSG comprises <strong>of</strong> various kinds <strong>of</strong> nodes for<br />
representing specific information <strong>of</strong> a complex 3d scene. Notable among these are :<br />
1) osg::Node – This is the base class <strong>of</strong> all internal nodes classes. This class is used<br />
very rarely in a scene graph but it has important members like Bounding Sphere,<br />
parentlist, NodeCallbacks (for update, event Traversal and cullCallback , more on<br />
callbacks later).<br />
2)osg::Group - An example <strong>of</strong> the Composite Design Pattern, this class is derived<br />
from osg::Node. It provides functionality for adding children nodes and maintains a<br />
list <strong>of</strong> children nodes.<br />
3)osg::Transform – This is a group node for which all children are transformed by a<br />
4x4 matrix. It is <strong>of</strong>ten used for positioning objects within a scene, producing<br />
trackball functionality or for animation.<br />
Transform itself does not provide set/get functions, only the interface for defining<br />
what the 4x4 transformation is. Subclasses, such as MatrixTransform and<br />
PositionAttitudeTransform support the use <strong>of</strong> an osg::Matrix or a<br />
osg::Vec3/osg::Quat respectively.<br />
osg::MatrixTransform - uses a 4x4 matrix for the transform<br />
osg::PositionAttitudeTransform - uses a Vec3 position, Quat rotation for the<br />
attitude, and Vec3 for a pivot point.<br />
4)osg::Geode - A Geode is a "geometry node", that is, a leaf node on the scene<br />
graph that can have "renderable things" attached to it. In OSG, renderable things<br />
are represented by objects from the Drawable class, so a Geode is a Node whose<br />
purpose is grouping Drawables. It maintains a list <strong>of</strong> “Drawables”.<br />
5) osg::Drawable - A pure virtual class (with 6 concrete derived classes) which<br />
provides all the import draw*() methods. In OSG, everything that can be rendered is<br />
implemented as a class derived from Drawable. The Drawable class contains no<br />
drawing primitives, since these are provided by subclasses such as osg::Geometry.<br />
Also, note that a Drawable is not a Node, and therefore it cannot be directly added<br />
to a scene graph. Instead, Drawables are attached to Geodes, which are scene graph<br />
nodes.<br />
This class contains a stateset and a list <strong>of</strong> parents along with cull and draw<br />
callbacks.The OpenGL state that must be used when rendering a Drawable is<br />
represented by a StateSet. These StateSets can be shared between drawables which<br />
proves to be a good way to improve performance, since this allows OSG to reduce<br />
the number <strong>of</strong> expensive changes in the OpenGL state. Like StateSets, Drawables<br />
can also be shared between different Geodes, so that the same geometry (loaded to<br />
memory just once) can be used in different parts <strong>of</strong> the scene graph.
FIGURE 20: Inheritance diagram for the osg::Drawable class.<br />
The major classes derived from this base class are:<br />
osg::Geometry : This class adds real geometry to the scene graph and can have<br />
vertex (and vertex data) associated with it directly, or can have any number <strong>of</strong><br />
'primitiveSet' instances associated with it. Vertex and vertex attribute data (color,<br />
normals, texture coordinates) is stored in arrays. Since more than one vertex may<br />
share the same color, normal or texture coordinate, and array <strong>of</strong> indices can be used<br />
to map vertex arrays to color, normal or texture coordinate arrays.<br />
osg::ShapeDrawable : It adds the ability to render the shape primitives, so that they<br />
can be rendered with reduced effort. Various shape primitives are: Box, Cone,<br />
Cylinder, Sphere, Triangle Mesh etc. ShapeDrawable currently doesn't render<br />
InfinitePlanes.<br />
6)osg::StateSet - Stores a set <strong>of</strong> modes and attributes which respresent a set <strong>of</strong><br />
OpenGL state. Notice that a StateSet contains just a subset <strong>of</strong> the whole OpenGL<br />
state. In OSG, each Drawable and each Node has a reference to a StateSet. These<br />
StateSets can be shared between different Drawables and Nodes (that is, several<br />
Drawables and Node s can reference the same StateSet). Indeed, this practice is<br />
recommended whenever possible, as this minimizes expensive state changes in the<br />
graphics pipeline. This state include textureModeList, textureAttributeList,<br />
attributeList, modeList etc along with updateCallback and eventCallback.<br />
All the nodes described above are part <strong>of</strong> the core module <strong>of</strong> OSG called osg. There<br />
are various other modules in osg like osgDB (plugin support library for managing<br />
the dynamic plugins - both loaders and NodeKits ), osgGA( GUI adapter library -<br />
to assist development <strong>of</strong> viewers), osgGLUT (GLUT viewer base class ) ,<br />
osgPlugins (28 plugins for reading and writing images and 3d databases) etc. Some<br />
<strong>of</strong> these will be discussed later in this report.<br />
3.1.3 Structure <strong>of</strong> Scene graph<br />
Having described major node types in OSG, let us discuss a typical scene hierarchy.<br />
The graph will have osg::Group at the top (representing the whole scene),<br />
osg::Groups, LOD's, Transform, Switches in the middle(dividing the scene in to
various logical units) and osg::Geode(Geometry Nodes containing osg::Drawables<br />
and osg::StateSets) as the leaf nodes.<br />
3.1.4 Windowing System in OSG<br />
Just like OpenGL, the core <strong>of</strong> OSG is independent <strong>of</strong> windowing system. The<br />
integration between OSG and some windowing system is delegated to other, noncore<br />
parts <strong>of</strong> OSG (users are also allowed to integrate OSG with any exotic<br />
windowing system they happen to use). Viewer implements the integration between<br />
OSG and Producer, AKA Open Producer thus <strong>of</strong>fering an out-<strong>of</strong>-the-box, scalable<br />
and multi-platform abstraction <strong>of</strong> the windowing system.<br />
3.1.5 Skeleton OSG Code<br />
FIGURE 21: Inheritance diagram for osgProducer::Viewer class<br />
This section describes various steps to setup a simple OSG program using the<br />
Nodes and the windowing system discussed in above sections. Given below are the<br />
steps to follow:<br />
1) Setup the viewer (osgProducer::Viewer instance)<br />
2) Create the scene graph for the scene(using various nodes like<br />
Geode,Geometry etc).<br />
3) Attach the viewer and graph using the setSceneData() method.<br />
4) Start the Simulation Loop which generates the scene :<br />
While(!viewer.done){<br />
/* wait for all cull and draw threads to complete*/<br />
viewer.sync();<br />
/*update the scene by traversing it with the the update visitor which will call all<br />
node update callbacks and animations. */<br />
viewer.update();
* fire <strong>of</strong>f the cull and draw traversals <strong>of</strong> the scene.*/<br />
}<br />
viewer.cull();<br />
3.1.6 Callbacks<br />
Users can interact with a scene graph using callbacks. Callbacks can be thought <strong>of</strong><br />
as user-defined functions that are automatically executed depending on the type <strong>of</strong><br />
traversal (update, cull, draw) being performed. Callbacks can be associated with<br />
individual nodes or they can be associated with selected types (or subtypes) <strong>of</strong><br />
nodes. During each traversal <strong>of</strong> a scene graph if a node is encountered that has a<br />
user-defined callback associated with it, that callback is executed.<br />
FIGURE 22: Callback Mechanism<br />
Code that takes advantage <strong>of</strong> callbacks can also be more efficient when a<br />
multithreaded processing mode is used. The code associated with update callbacks<br />
happens once per frame before the cull traversal. One way would be to insert the<br />
code in the main simulation loop between the viewer.update() and viewer.frame()<br />
calls. However, callbacks provide an interface that is easier to update and maintain.<br />
3.1.7 osgGA::GUIEventHandler<br />
The GUIEventHandler class provideds developers with an interface to the<br />
windowing sytem's GUI events. The event handler recieves updates in the form <strong>of</strong><br />
GUIEventAdapter instances. The event handler can also send requests for the GUI<br />
system to perform some operation using GUIActionAdapter instances.<br />
Information about GUIEventAdapters instances include the type <strong>of</strong> event (PUSH,<br />
RELEASE, DOUBLECLICK, DRAG, MOVE, KEYDOWN, KEYUP, FRAME,<br />
RESIZE, SCROLLUP, SCROLLDOWN, SCROLLLEFT). Depending on the type<br />
<strong>of</strong> GUIEventAdapter, the instance may have additional information associated with<br />
it.
The GUIEventHandler uses GUIActionAdapters to request actions <strong>of</strong> the GUI<br />
system. It interacts with the GUI primarily with the 'handle' method. The handle<br />
method has two arguments: an instance <strong>of</strong> GUIEventAdapter for receiving updates<br />
from the GUI, and a GUIActionAdapter for requesting actions <strong>of</strong> the GUI. The<br />
handle method can examine the type and values associated with the<br />
GUIEventAdapter, perform required operations, and make a request <strong>of</strong> the GUI<br />
system using the GUIActionAdapter. The handle method returns boolean variable<br />
set to true if the event has been 'handled', false otherwise.<br />
3.2 D+TR in OSG<br />
In this section, we describe the specifications <strong>of</strong> our D+TR system ported in OSG.<br />
3.2.1 <strong>Representation</strong><br />
Basic <strong>Representation</strong> in OSG consists <strong>of</strong> a special kind <strong>of</strong> node called<br />
<strong>Depth</strong>TR, which is derived from Geode class. This <strong>Depth</strong>TR node can store the<br />
geometry <strong>of</strong> the scene and has a special class called dtrDrawable (inherited from<br />
osg::Drawable) attached to it (more on the dtrDrawable later in this section).<br />
There is another class called InputView which contains <strong>Depth</strong> Images (DIs) and<br />
calibration parameters which were the inputs to our D+TR system. <strong>Depth</strong> map is a<br />
two dimensional array <strong>of</strong> real values with location (i,j) storing the depth distance to<br />
the point that projects pixel (i,j) in the image. Closer points are shown brighter.<br />
<strong>Depth</strong> and <strong>Texture</strong> are stored as images in disk. The depth map contains real values<br />
whose range depends upon the resolution <strong>of</strong> the structure recovery method. Images<br />
with 16 bits per pixel can store information up to 65 meters. <strong>Depth</strong>TR class contains<br />
a pointer to the array <strong>of</strong> InputViews.<br />
Class <strong>Depth</strong>TR has several important functions like load() (loads all input texture<br />
and depth maps), projectView() (projects input view to the novel view orientation),<br />
setNovelView (to set the validity flag for each input view), getNovelView() (returns<br />
novel view generated from the input textures and depth maps) etc. Class<br />
dtrDrawable has a function called drawImplementation() which basically is used<br />
for rendering the <strong>Depth</strong>TR node in accordance to the <strong>Depth</strong>+TR algorithm<br />
presented in previous chapters. Class InputView has a function for loading a<br />
depthMap, calculating the 3D from the image using get3D() and a function for<br />
projecting the view in novelView direction.<br />
<strong>Depth</strong>TR<br />
dtrDrawable<br />
InputView<br />
InputView * inputViews<br />
valid<br />
numberOfInputViews<br />
<strong>Depth</strong>TR* _ss calibration, cameraCenter,<br />
class dtrDrawable<br />
imageFile, depthFile<br />
load()<br />
drawImplementation(o load()<br />
projectView(…)<br />
sg::State& state) get3D()<br />
cosAngleBlending(…)<br />
projectView(…)<br />
FIGURE 23: Class Diagrams <strong>of</strong> D+TR in OSG
3.2.2 Rendering<br />
We have implemented Implied Triangulation approach for rendering in<br />
OSG. We have already described this approach in the previous chapter. In this<br />
section, we further describe how the Rendering occurs in OSG.<br />
3.2.2.1 Rendering in OSG<br />
OSG provides an excellent framework for maximizing graphics performance. Any<br />
scene graph employs three key phases while generating a 3D scene. These are App,<br />
Cull, Draw phases. In App phase, the graphics application sets up the scene graph<br />
and the parameters necessary for rendering. In Cull phase, culling <strong>of</strong> the objects that<br />
won’t appear on the screen is done. The hierarchical structure <strong>of</strong> scene graph<br />
enables efficient culling. And, finally in the Draw phase, the scene is actually drawn<br />
on the screen.For further optimization, these three phases are carried out by<br />
different threads simultaneously as shown in Figure 24 below.<br />
FIGURE 24: Parallelism in OSG<br />
During the Cull phase, the whole scene graph is traversed by NodeVisitor and the<br />
visibility <strong>of</strong> each node is determined. The scene graph is constructed in such a way<br />
that the geometry nodes (Geode) lie at the bottom <strong>of</strong> the graph (leaf nodes). Each<br />
Geode further contains a list <strong>of</strong> Drawables (Geometry, ShapeDrawable, Text etc)<br />
which can be drawn. During the ‘draw’ phase, the scene graph is again traversed by<br />
NodeVisitor, which calls a virtual function called<br />
‘drawImplementation(osg::State&)’ while rendering the Drawables.<br />
drawImplementation(State&) is a pure virtual method for the actual implementation<br />
<strong>of</strong> OpenGL drawing calls, such as vertex arrays and primitives, that must be<br />
implemented in concrete subclasses <strong>of</strong> the Drawable base class, examples include<br />
osg::Geometry and osg::ShapeDrawable. drawImplementation(State&) is called<br />
from the draw(State&) method, with the draw method handling management <strong>of</strong><br />
OpenGL display lists, and drawImplementation(State&) handling the actual<br />
drawing itself.
The code above is extracted from the source <strong>of</strong> OSG. As mentioned earlier,our<br />
<strong>Depth</strong>TR class is derived from the Geode class. We add dtrDrawable to it for<br />
drawing our IBR scene. The ‘drawImplementation(osg::State&)’ function <strong>of</strong><br />
dtrDrawable class is overridden to render <strong>Depth</strong> Maps using the D+TR algorithm.<br />
3.2.3 Discussion<br />
This section describes the rendering algorithm implemented in<br />
‘drawImplementation’ method <strong>of</strong> dtrDrawable class from both OSG and D+TR<br />
point <strong>of</strong> view.<br />
During each ‘drawImplementation’ call to dtrDrawable, all inputViews are<br />
projected to the novelView using projectView() function. This is followed by<br />
reading the projected images using glReadPixels() and blending the views which<br />
are set to be valid (by setNovelView() function). Then, the angular blending is<br />
carried out and the final novelImage is drawn to the framebuffer using<br />
glWritePixels().<br />
Any keyboard event invokes the GUIEventHandler function which can be used to<br />
set novelView Parameters by calling NextView() function (this function set the<br />
novelView direction and is an auxillary function used in our system). This is<br />
followed by a call to ‘drawImplementation’ function which renders the novelImage<br />
on to the window screen.
To summarize the whole process, given n depth images and a particular novel view<br />
point, first we find the depth images on the same side <strong>of</strong> the novel view point.<br />
<strong>Depth</strong> images are considered to be on same side, when the angle subtended at the<br />
center <strong>of</strong> scene with depth image's camera center and novel view camera center is<br />
less than particular angle (threshold). Novel views are generated using these depth<br />
images. Now for each pixel Ô in the novel view, we compare the z. -values across<br />
all these novel views and keep the nearest z. -values within threshold. Weights are<br />
computed using blending for each pixel Ô as described earlier. The complete<br />
rendering algorithm is given in Algorithm above. Flow chart for this algorithm is<br />
given in Figure 25 below.<br />
FIGURE 25: Flow chart for complete rendering
3.3 Conclusions and Results<br />
In this chapter we have described the basic OSG concepts and how the D+TR<br />
system is implemented in OSG. We have also described the class structure and the<br />
rendering process <strong>of</strong> our system in great detail.<br />
Here are some <strong>of</strong> the snapshots <strong>of</strong> our system:<br />
The results will include the FPS figures for the table data in out system and are<br />
summarized in the table below:<br />
Application Time in seconds per frame<br />
D+TR on OSG<br />
References:<br />
[1] http://graphics.stanford.edu/projects/mich/.<br />
[2] http://www.cyberware.com.<br />
[3] H. Baker, D. Tanguay, I. Sobel, M. E. G. Dan Gelb, W. B. Culbertson, and T.<br />
Malzbender. The Coliseum Immersive Teleconferencing System. In <strong>International</strong> Workshop<br />
on Immersive Telepresence (ITP2002), 2002.<br />
[4] C. Buehler, M. Bosse, L. McMillan, S. J. Gortler, and M. F. Cohen. Unstructured<br />
Lumigraph Rendering. In SIGGRAPH, 2001.<br />
[5] S. Chen and L. Williams. View Interpolation for Image Synthesis. In SIGGRAPH, 1993.<br />
[6] P. E. Debevec, C. J. Taylor, and J. Malik. Modeling and Rendering Architecture from<br />
Photographs: A Hybrid Geometry and Image-Based Approach. In SIGGRAPH, 1996.<br />
[7] B. Girod, C.-L. Chang, P. Ramanathan, and X. Zhu. Light Field Compresion Using<br />
Disparity-Compensated Lifting. In ICASSP, 2003.<br />
[8] S. J. Gortler, R. Grzeszczuk, R. Szeliski, and M. F.Cohen. The Lumigraph. In<br />
SIGGRAPH, 1996.<br />
[9] I. Ihm, S. Park, and R. K. Lee. Rendering <strong>of</strong> spherical light fields. In Pacific Graphics,<br />
1997.<br />
[10] R. Krishnamurthy, B.-B. Chai, H. Tao, and S. Sethuraman. Compression and<br />
Transmission <strong>of</strong> <strong>Depth</strong> Maps for Image-Based Rendering. In <strong>International</strong> Conference on<br />
Image Processing, 2001.<br />
[11] M. Levoy and P. Hanrahan. Light Field Rendering. In SIGGRAPH, 1996.
[12] M. Magnor, P. Eisert, and B. Girod. Multi-view image coding with depth maps and 3-d<br />
geometry for prediction. Proc. SPIE Visual Communication and Image Processing (VCIP-<br />
2001), San Jose, USA, pages 263{271, Jan. 2001.<br />
[13] W. R. Mark. Post-Rendering 3D Image Warping: Visibility, Reconstruction, and<br />
Performance for <strong>Depth</strong>Image Warping. PhD thesis, University <strong>of</strong> North Carolina, 1999.<br />
[14] L. McMillan. An Image-Based Approach to Three Dimensional Computer Graphics.<br />
PhD thesis, University <strong>of</strong> North Carolina, 1997.<br />
[15] L. McMillan and G. Bishop. Plenoptic Modelling: An Image-Based Rendering<br />
Algorithm. In SIGGRAPH, 1995.<br />
[16] P. J. Narayanan, P. W. Rander, and T. Kanade. Constructing Virtual Worlds Using<br />
Dense Stereo. In Proc <strong>of</strong> the <strong>International</strong> Conference on Computer Vision,Jan 1998.<br />
[17] P. J. Narayanan, Sashi Kumar P, and Sireesh Reddy K. <strong>Depth</strong>+<strong>Texture</strong> <strong>Representation</strong><br />
for Image Based Rendering. In ICVGIP, 2004.<br />
[18] Sashi Kumar Penta and P. J. Narayanan. Compression <strong>of</strong> Multiple <strong>Depth</strong>-Maps for<br />
IBR. In Paci_c Graphics, 2005.<br />
[19] S. M. Seitz and C. R. Dyer. View Morphing. In SIGGRAPH, 1996.<br />
[20] H.-Y. Shum and L.-W. He. Rendering with concentric mosaics. In SIGGRAPH, 1999.<br />
[21] X. Tong and R. M. Gray. Coding <strong>of</strong> multi-view images for immersive viewing. In<br />
ICASSP, 2000.<br />
[22] H. Towles, W.-C. Chen, R. Yang, S.-U. Kam, and H. Fuchs. 3D Tele-Collaboration<br />
Over Internet2. In <strong>International</strong> Workshop on Immersive Telepresence (ITP2002), 2002.<br />
[23] C. L. Zitnick, S. B. Kang, M. Uyttendaele, S. Winder, and R. Szeliski. High-quality<br />
video view interpolation using a layered representation. In SIGGRAPH, 2004.<br />
[24] Michael Waschbusch, S. Wurmlin, D. Cotting, F. Sadlo, and M. Gross. Scalable 3D<br />
Video <strong>of</strong> Dynamic Scenes. In The Visual Computer, 2005.<br />
[25] Pooja Verlani, Aditi Goswami, P. J. Narayanan, Shekhar Dwivedi and Sashi Kumar<br />
Penta. <strong>Depth</strong> Images: <strong>Representation</strong>s and Real-time Rendering. In Third <strong>International</strong><br />
Symposium on 3D Data Processing, Visualization and Transmission (3DPVT), 2006<br />
Appendix<br />
I. Features and Advantages <strong>of</strong> OpenSceneGraph<br />
The stated goal <strong>of</strong> Open Scene Graph is to make the benefits <strong>of</strong> scene graph<br />
technology freely available to all, both commercial and non commercial users. OSG<br />
is written entirely in Standard C++ and OpenGL, it makes full use <strong>of</strong> the STL and<br />
Design Patterns, and leverages the open source development model to provide a<br />
development library that is legacy free and focused on the needs <strong>of</strong> end users.<br />
The stated key strengths <strong>of</strong> Open Scene Graph are its performance, scalability,<br />
portability and the productivity gains associated with using a fully featured scene<br />
graph, in more detail:<br />
Performance
Supports view frustum culling, occlusion culling, small feature culling, Level Of<br />
Detail (LOD) nodes, state sorting, vertex arrays and display lists as part <strong>of</strong> the core<br />
scene graph. These together make the Open Scene Graph one <strong>of</strong> the highest<br />
performance scene graph available.<br />
The Open Scene Graph also supports easy customization <strong>of</strong> the drawing process,<br />
such as implementation <strong>of</strong> Continuous Level <strong>of</strong> Detail (CLOD) meshes on top <strong>of</strong><br />
the scene graph (see Virtual Terrain Projection and Demeter).<br />
Productivity<br />
The core scene graph encapsulates the majority <strong>of</strong> OpenGL functionality including<br />
the latest extensions, provides rendering optimizations such as culling and sorting,<br />
and a whole set <strong>of</strong> add on libraries which make it possible to develop high<br />
performance graphics applications very rapidly. The application developer is freed<br />
to concentrate on content and how that content is controlled rather than low level<br />
coding.<br />
FormatSupport: OSG now states that it includes 45 separate plugin's for loading<br />
various 3D database and image formats. 3D database loaders include OpenFlight<br />
(.flt),TerraPage (.txp) including multi-threaded paging support , LightWave (.lwo),<br />
Alias Wavefront (.obj) , Carbon Graphics GEO (.geo) ,3D Studio MAX (.3ds)<br />
,Peformer (.pfb) ,Quake Character Models (.md2) , Direct X (.x) ,Inventor Ascii 2.0<br />
(.iv) ,VRML 1.0 (.wrl) ,Designer ,Workshop (.dw) ,AC3D (.ac) ,.osg Native<br />
OSG ASCII format , .osg Native OSG banary Format<br />
Image loaders include .rgb ,.gif, ,.jpg ,.png ,.tiff ,.pic ,.bmp ,.dds ,.tga , quicktime<br />
(under OSX),Fonts (via the freetype plugin)<br />
Node Kits<br />
OSG also has a set <strong>of</strong> Node Kits which are separate libraries that can be compiled in<br />
with your applications or loaded in at runtime, which add support for particle<br />
systems (osgParticle) ,high quality anti-aliased text (osgText) ,special effects<br />
framework (osgFX) ,OpenGL shader language support (osgGL2) ,large scale<br />
geospatial terrain database generation (osgTerrain) ,navigational light points<br />
(osgSim) ,osgNV ( support for NVidia's vertex, fragment, combiner, Cg shaders )<br />
,Demeter (CLOD terrain + integration with OSG) ,osgCal (which integrates Cal3D<br />
and the OSG) ,osgVortex (which integrates the CM-Labs Vortex physics enginer<br />
with OSG).<br />
Portability<br />
The core scene graph has been designed to have minimal dependency on any<br />
specific platform, requiring little more than Standard C++ and OpenGL. This has
allowed the scene graph to be rapidly ported to a wide range <strong>of</strong> platforms -<br />
originally developed on IRIX, then ported to: Irix , Linux ,Windows ,FreeBSD.<br />
Window Systems<br />
The core OSG library is completely windowing system independent, which makes<br />
it easy for users to add their own window-specific libraries and applications on top.<br />
In the distribution thereis already the osgProducer library which integrates with<br />
OpenProducer, and in the Community/Applications section <strong>of</strong> this website one can<br />
find examples <strong>of</strong> applications and libraries written on top <strong>of</strong> GLUT, Qt, MFC,<br />
WxWindows and SDL. Users have also integrated it with Motif, and X.<br />
Scalability<br />
OSG will not only run on portables all the way up to Onyx Infinite Reality<br />
Monsters, but also supports the multiple graphics subsystems found on machines<br />
like a multi-pipe Onyx<br />
II<br />
Vertex Shader and Pixel Shader Code<br />
Vertex Shader:<br />
struct appdata<br />
{<br />
float4 position : POSITION;<br />
float3 texpos: TEXCOORD0;<br />
float3 pointpos: COLOR;<br />
};<br />
struct vs2ps<br />
{<br />
float4 currpos2 : POSITION;<br />
float4 currpos : TEXCOORD1;<br />
float3 texpos: TEXCOORD0;<br />
float3 pointpos: COLOR;
};<br />
vs2ps main(appdata IN, uniform float4x4 modelMatProj )<br />
{<br />
vs2ps OUT;<br />
OUT.currpos2 = OUT.currpos = mul(modelMatProj, IN.position);<br />
OUT.texpos=IN.texpos;<br />
OUT.pointpos=IN.pointpos;<br />
}<br />
return OUT;<br />
Pixel Shader:<br />
struct vpixel_out {<br />
float4 color : COLOR;<br />
};<br />
struct vs2ps<br />
{<br />
float4 currpos : POSITION;<br />
float3 texpos: TEXCOORD0;<br />
float3 pointpos: COLOR;<br />
};<br />
vpixel_out main(<br />
vs2ps IN,<br />
uniform float3 c,<br />
uniform float3 n,<br />
uniform sampler2D texture,<br />
uniform samplerRECT buffer)<br />
{<br />
vpixel_out OUT;<br />
float4 color;<br />
float4 color_old;<br />
float3 v1;<br />
float3 v2;<br />
IN.currpos /= 512.0;<br />
v1 = normalize( IN.pointpos - c );<br />
v2 = normalize( IN.pointpos - n );<br />
color.rgba = tex2D( texture, IN.texpos.xyz ).rgba;<br />
color_old.rgba = texRECT( buffer, IN.currpos.xy ).rgba;<br />
color.a = dot( v1, v2) ;<br />
color.a = pow( color.a,8);<br />
OUT.color.rgb = ( color.rgb * color.a + color_old.rgb *<br />
color_old.a) / ( color.a + color_old.a);<br />
OUT.color.a = color.a + color_old.a; // alpha can get out <strong>of</strong><br />
range
}<br />
III<br />
return OUT;<br />
Class Definitions<br />
#include <br />
#include <br />
#include <br />
#include <br />
#include <br />
#include <br />
#include "matrix/matrix.h"<br />
#include <br />
#define FILE_NAME_SIZE 200<br />
#define MAX_DEPTH -100000000.00<br />
#define THRESHOLD 3.50<br />
#define ANGLE_THRESH M_PI/3<br />
#define TRIANGULATION 1<br />
#define SPLATTING 2<br />
extern float radius,theta,phi,ex,ey,ez;<br />
extern int mode, k, blendType, novelWidth,novelHeight;<br />
extern int type, resolution;<br />
extern float pointSize;<br />
extern int vn;<br />
extern bool finished;<br />
extern float m[3][4], calib[3][3];<br />
class InputView<br />
{<br />
public:<br />
char imageFile[FILE_NAME_SIZE];<br />
char depthFile[FILE_NAME_SIZE];<br />
int width, height;<br />
CMatrix modelMatrix;<br />
CMatrix calibration;<br />
CMatrix cameraCenter;<br />
float * depth;<br />
float * X;<br />
float * Y;<br />
float * Z;<br />
float * weights;<br />
view.<br />
// tells whether this camera is valid for this novel<br />
bool valid;<br />
// projected Values<br />
GLubyte * projectedImage;<br />
float * projected<strong>Depth</strong>;
tells whether particular pixel is visible in<br />
projectedImage.<br />
bool * holes;<br />
InputView();<br />
// loads depth and texture<br />
void load(char * imFile, char * depFile);<br />
void get3D();<br />
// projects this view into novel view orientation<br />
void projectView(float depth_threshold, int type, int<br />
resolution, float pointSize);<br />
};<br />
class <strong>Depth</strong>TR: public osg::Geode<br />
{<br />
public:<br />
int numberOfInputViews;<br />
InputView * inputViews;<br />
// tells the, whether particular pixel in novel view is<br />
valid or not.<br />
bool * valid;<br />
// temporary vars used to read GL buffers<br />
GLfloat * projectedZbuffer;<br />
// novel view params<br />
int width;<br />
int height;<br />
// this threshold is used to find whether three near by<br />
vertices can form triangle or not<br />
float depth_threshold;<br />
// this threshold is used to eliminate straight away if<br />
they are not near to novel view camera<br />
float angle_threshold;<br />
GLubyte * novelImage;<br />
float * novelView<strong>Depth</strong>;<br />
float * novelViewX;<br />
float * novelViewY;<br />
float * novelViewZ;<br />
CMatrix R, t;<br />
CMatrix cameraCenter;<br />
CMatrix calibration;<br />
CMatrix modelMatrix;<br />
bool save;<br />
<strong>Depth</strong>TR():osg::Geode()<br />
{<br />
init();<br />
depth_threshold = THRESHOLD;<br />
angle_threshold = ANGLE_THRESH;<br />
}<br />
void dtrDrawable_drawImplementation() ;
loads all the input textures and depth maps<br />
void load(char *);<br />
void computeWeights(int reqNumber, int blendType, float<br />
* angles, float * weights, bool * flags);<br />
// blending functions<br />
void angleBlending(float * angles, float * weights,<br />
bool * flags, int * positions);<br />
void exponentialAngleBlending(float * angles, float *<br />
weights, bool * flags, int * positions);<br />
void cosAngleBlending(float * angles, float * weights,<br />
bool * flags, int * positions);<br />
void newAngleBlending(int k, float * angles, float *<br />
weights, bool * flags, int * positions);<br />
void inverseAngleBlending(int k, float * angles, float<br />
* weights, bool * flags, int * positions);<br />
// resets the validity flag.<br />
void setNovelView(float model[][4], float calib[][3]);<br />
// returns novel view generated from the input textures<br />
and depth maps<br />
GLubyte* getNovelView(int blendType, int reqNumber);<br />
// project input view number vn onto the novel view<br />
orientation<br />
void projectView(int vn, int type, int resolution,<br />
float pointSize);<br />
void getProjectedDT(int vn);<br />
void saveProjectedImage(int vn);<br />
private:<br />
void init(); // Shared constructor code, generates<br />
the drawables<br />
class dtrDrawable;<br />
friend class dtrDrawable;<br />
bool dtrDrawable_computeBound(osg::BoundingBox&) const;<br />
};<br />
class <strong>Depth</strong>TR::dtrDrawable: public osg::Drawable<br />
{<br />
public:<br />
<strong>Depth</strong>TR* _ss;<br />
dtrDrawable(<strong>Depth</strong>TR* ss):<br />
osg::Drawable(), _ss(ss) { init(); }<br />
dtrDrawable():_ss(0)<br />
{<br />
init();<br />
osg::notify(osg::WARN)
"Warning: unexpected call to<br />
osgSim::SphereSegment::Spoke() copy constructor"setAttributeAndModes(new<br />
osg::LineWidth(2.0),osg::StateAttribute::OFF);<br />
}<br />
};<br />
virtual osg::BoundingBox computeBound() const;<br />
void calculateParams ();<br />
#endif