31.03.2015 Views

Depth + Texture Representation - International Institute of ...

Depth + Texture Representation - International Institute of ...

Depth + Texture Representation - International Institute of ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Depth</strong> + <strong>Texture</strong> <strong>Representation</strong>:<br />

Study and Support<br />

in OpenSceneGraph<br />

(Final Year Project Report)<br />

Aditi Goswami<br />

200201058


Abstract<br />

Image Based Rendering holds a lot <strong>of</strong> promise for navigating through a real world<br />

scene without modeling it manually. For IBR, different representations have been<br />

proposed in literature. One <strong>of</strong> the techniques for Image Based Rendering uses depth<br />

maps and texture images from a number <strong>of</strong> viewpoints and is a rich and viable<br />

representation for IBR but is computationally intensive. Our approach using GPU<br />

gives the results in real time with quality output.<br />

We also provide the support for our system on a widely used, open source graphics<br />

API called OpenSceneGraph (osg). osg represents a complex 3D scene as a<br />

hierarchical; object oriented model called Scene graphs. The actual geometry is<br />

stored in leaf nodes <strong>of</strong> the scene graphs. We add our IBR technique (<strong>Depth</strong><br />

+<strong>Texture</strong> <strong>Representation</strong>) to osg as an independent class, which will allow the user<br />

to develop hybrid scenes: the geometry coming partially from actual 3D models and<br />

from our IBR technique.<br />

Keywords: Image based modeling and rendering, Splatting, triangulation,<br />

Blending, Graphics Processing Unit, Vertex Shader, Pixel Shader, Scene Graphs


TABLE OF CONTENTS<br />

TITLE No.<br />

Page No.<br />

1.Introduction 4<br />

1.1 Image Based Rendering 4<br />

1.2 GPU 5<br />

1.3 Cg 7<br />

1.4 Scene Graphs 8<br />

1.5 Related Background 9<br />

2. <strong>Depth</strong> + TR system 11<br />

2.1 <strong>Representation</strong> 11<br />

2.2 Rendering 12<br />

2.2.1 Rendering one D+TR 12<br />

2.2.2 Rendering multiple D+TR 13<br />

2.3 Blending on CPU 13<br />

2.4 GPU Algorithm 15<br />

2.5 Two Pass Algorithm 15<br />

2.5.1 Pass 1 16<br />

2.5.2 Pass 2 16<br />

2.6 Blending on GPU 17<br />

2.7 Pipeline <strong>of</strong> Accelerated D+TR 18<br />

2.8 Some Results 20<br />

3. D+TR & OpenSceneGraph 21<br />

3.1 OpenSceneGraph 21<br />

3.1.1 “What is Scene Graph?” 22<br />

3.1.2 Nodes in OSG 25<br />

3.1.3 Structure <strong>of</strong> Scene Graph 26<br />

3.1.4 Windowing System in OSG 27<br />

3.1.5 Skeleton OSG Code 27<br />

3.1.6 Callbacks 28<br />

3.1.7 osgGA::GUIEventHandler 28<br />

3.2 D+TR in OSG 29<br />

3.2.1 <strong>Representation</strong> 29<br />

3.2.2 Rendering 30<br />

3.2.2.1 Rendering in OSG 31<br />

3.2.3 Discussion 31<br />

3.3 Conclusions and Results 33<br />

References<br />

Appendix


Chapter 1<br />

Introduction<br />

The potential <strong>of</strong> Image Based Modeling and Rendering (IBMR) is to produce new<br />

views <strong>of</strong> a real scene with the realism impossible to achieve by other means, which<br />

makes it very appealing. IBMR aims to capture an environment using a number <strong>of</strong><br />

(carefully placed) cameras. Any view <strong>of</strong> the environment can be generated from<br />

these views subsequently. For Rendering the new views generated using the D+TR<br />

(<strong>Depth</strong> + <strong>Texture</strong> representation) technique various algorithms like splatting and<br />

triangulation are used and further implementing them on GPU gives real time<br />

acceleration to the system.<br />

1.1 Image Based Rendering<br />

Key features <strong>of</strong> any Image based Rendering algorithm is constructing and rendering<br />

models. Construction <strong>of</strong> models is achieved using techniques <strong>of</strong> vision as stereo, 3D<br />

co-ordinates calculation, calibration etc. and rendering is done using graphics<br />

algorithms as splatting, triangulation, interpolation, and algorithms implemented<br />

using GPU. Hence we can describe Image based Rendering as a unique example<br />

where vision and graphics both are implemented hand in hand.<br />

We can classify various IBR techniques into two broad categories. They are shown<br />

in the figure 1 below. The D+TR technique discussed in this report comes under the<br />

‘representation with geometry’. <strong>Representation</strong> without geometry means that some<br />

graphics algorithms as culling etc. are used to render the model whose location in<br />

the world is fixed and no 3D co-ordinates calculation is done. Whereas in a<br />

representation with a geometry involves generating a novel view based on the 3D<br />

co-ordinates in the scene.<br />

IMAGE BASED RENDERING<br />

\\\ WITH GEOMETRY<br />

WITHOUT GEOMETRY<br />

FIGURE 1<br />

Figure 1: Classification <strong>of</strong> Image Based Rendering<br />

Generally, most <strong>of</strong> the IBR techniques being studied and implemented in past suffer<br />

either form quality or speed. Getting both these factors together is still a big<br />

challenge. The quality <strong>of</strong> the novel views being rendered using D+TR technique is<br />

very high, but it lacks the speed to make it real time. Our attempt is to implement<br />

GPU based algorithms to accelerate the current version <strong>of</strong> D+TR. The concept <strong>of</strong><br />

generating the novel view image is same as in original D+TR, but the algorithms


are twisted a bit to suit the limitations and features <strong>of</strong> GPU. All these algorithms<br />

and limitations are discussed in chapters further in the report.<br />

In figure below, the basic structure <strong>of</strong> D+TR is shown and the parts in the structure<br />

where GPU algorithms are implemented are also shown<br />

<strong>Depth</strong> Maps<br />

Calibration<br />

Parameters<br />

Estimation <strong>of</strong> valid<br />

views for this novel<br />

view camera<br />

Novel View<br />

Camera<br />

Rendering Valid views<br />

from novel view.<br />

Storing novel view<br />

depth and texture<br />

from each view<br />

Estimation<br />

<strong>of</strong> 3D<br />

<strong>Texture</strong><br />

Novel View<br />

Blending<br />

GPU is used to implement and accelerate the four<br />

major blocks within dotted lines <strong>of</strong> original D+TR<br />

FIGURE 2: Basic Structure <strong>of</strong> D+TR technique <strong>of</strong> IBR<br />

As shown in figure above, the modules <strong>of</strong> rendering, 3D co-ordinates estimation<br />

and blending are key parts <strong>of</strong> D+TR technique <strong>of</strong> IBR and are costly also, hence<br />

GPU is used to accelerate them. GPU algorithm for all these modules is explained<br />

later.<br />

1.2 GPU (Graphics processing unit)<br />

The power and flexibility <strong>of</strong> GPU’s make them an attractive platform for general<br />

purpose computation. Modern Graphics processing units are deeply programmable<br />

and support high precision. GPU has various advantages and has certain limitations<br />

as. Figure below shows the modern GPU processor pipeline.


GPU’s are also termed as fast co-processor. The speed <strong>of</strong> GPU’s is increasing at<br />

cubed Moore’s law. They allow data processing in parallel. GPU is used in many<br />

applications like simulation <strong>of</strong> physical processes as fluid flow, n-body systems,<br />

molecular dynamics, real time visualization <strong>of</strong> complex phenomena, rendering<br />

complex scenes among others.<br />

The kernel <strong>of</strong> GPU is vertex/fragment program which runs on each element <strong>of</strong> the<br />

input stream, generating an output stream. Input stream is a stream <strong>of</strong> fragments or<br />

vertices or texture data. Output stream is frame buffer or pixel buffer or texture.<br />

Hence GPU is also termed as parallel stream processor.<br />

Graphics State<br />

Application<br />

on<br />

CPU<br />

Vertex<br />

Shader<br />

Rasterizer<br />

Pixel<br />

Shader<br />

Memory<br />

<strong>Texture</strong><br />

Vertices (3D) Vertices (2D) Render to<br />

FIGURE 3: Modern GPU Processor’s Pipeline<br />

Vertex Shader: A vertex Shader allows custom processing <strong>of</strong> per vertex information<br />

on the GPU. The position, color and other related data can be modified by the<br />

vertex shader. Each vertex is processed in a sequence <strong>of</strong> steps before geometry are<br />

rasterized. The same program is executed for every vertex. There is no<br />

interdependence between the vertices, hence shaders can be implemented in<br />

parallel. The shader cannot delete/add a vertex. Vertex shaders are used in giving<br />

unique graphical effects like deformation <strong>of</strong> mesh, procedural geometry, blending,<br />

texture generation, interpolation, displacement mapping, and motion blur among<br />

others.<br />

Pixel Shader: It deals with programming the pixel. It allows custom processing <strong>of</strong><br />

fragment information on the GPU. The color <strong>of</strong> each fragment can be computed<br />

rather than simply being interpolated by the rasterizer. Fragment or Pixel shaders


are responsible for texturing, shading and blending. They can also be implemented<br />

in parallel in graphics hardware. The color <strong>of</strong> each component is individually<br />

computed rather than just been computed by the rasterizer. Pixel Shaders have<br />

limited or no knowledge <strong>of</strong> neighboring pixels. Output <strong>of</strong> the fragment shader can<br />

be color or depth.<br />

Limitations <strong>of</strong> GPU: Besides providing such a powerful approach towards<br />

computation, GPU’s also laid certain constraints on its use. Graphics Processor<br />

avoids read backs. It does not allow any arbitrary write to any location. Each<br />

operation is performed on the chunk <strong>of</strong> data. Code written on CPU cannot be simply<br />

ported on GPU, it requires some initialization and binding procedure.<br />

1.3 Cg<br />

Historically, graphics hardware has been programmed at a very low level. Fixedfunction<br />

pipelines were configured by setting states such as the texture-combining<br />

modes. More recently, programmers configured programmable pipelines by using<br />

programming interfaces at the assembly language level. In theory, these low-level<br />

programming interfaces provided great flexibility. In practice, they were painful to<br />

use and presented a serious barrier to the effective use <strong>of</strong> hardware.<br />

Using a high-level programming language, rather than the low-level languages <strong>of</strong><br />

the past, provides several advantages. The compiler optimizes code automatically<br />

and performs low-level tasks, such as register allocation, that are tedious and prone<br />

to error. Shading code written in a high-level language is much easier to read and<br />

understand. It also allows new shaders to be easily created by modifying previously<br />

written shaders. Shaders written in a high-level language are portable to a wider<br />

range <strong>of</strong> hardware platforms than shaders written in assembly code.<br />

This chapter introduces Cg, a new high-level language tailored for programming<br />

GPUs. Cg <strong>of</strong>fers all the advantages just described, allowing programmers to finally<br />

combine the inherent power <strong>of</strong> the GPU with a language that makes GPU<br />

programming easy. Cg is very similar to C that is why it is called C for graphics<br />

(Cg). It is high level shading language. Cg is easy to use with OpenGL. It has<br />

powerful swizzle operator and build in vector and matrix type, supports basic types,<br />

structure, array, type conversion as in C, support large number <strong>of</strong> mathematical,<br />

derivative and geometric functions as well.<br />

The Cg language allows you to write programs for both the vertex processor and the<br />

fragment processor. We refer to these programs as vertex programs and fragment<br />

programs, respectively. (Fragment programs are also known as pixel programs or<br />

pixel shaders, and we use these terms interchangeably in this document.) Cg code<br />

can be compiled into GPU assembly code, either on demand at run time or<br />

beforehand. Cg makes it easy to combine a Cg fragment program with a<br />

handwritten vertex program, or even with the non-programmable OpenGL or<br />

DirectX vertex pipeline. Likewise, a Cg vertex program can be combined with a


handwritten fragment program, or with the non-programmable OpenGL or DirectX<br />

fragment pipeline.<br />

FIGURE 4: Pipeline <strong>of</strong> Cg Code and Compiler<br />

1.4 Scene Graph<br />

A Scene Graph is a data structure which captures the logical and spatial<br />

representation <strong>of</strong> a graphical scene. A Scene Graph consists <strong>of</strong> various kinds <strong>of</strong><br />

nodes, each representing a particular type <strong>of</strong> information like transformations or<br />

lighting.<br />

Open Inventor, OpenGL Performer, OpenSceneGraph, Java 3D etc are some <strong>of</strong> the<br />

common APIs which use a Scene Graph. The key reasons that many graphics<br />

developers uses scene graphs are Performance, Productivity, Portability and<br />

Scalability:<br />

1)Performance<br />

Scene graphs provide an excellent framework for maximizing graphics<br />

performance. A good scene graph employs two key techniques - culling <strong>of</strong> the<br />

objects that won't be seen on screen, and state sorting <strong>of</strong> properties such as textures<br />

and materials, so that all similar objects are drawn together. The hierarchical<br />

structure <strong>of</strong> the scene graph makes this culling process very efficient, for instance a<br />

whole city can be culled with just a few operations.<br />

2)Productivity<br />

Scene graphs take away much <strong>of</strong> the hard work required to develop high<br />

performance graphics applications. Furthermore, one <strong>of</strong> most powerful concepts in<br />

Object Oriented programming is that <strong>of</strong> object composition, enshrined in the<br />

Composite Design Pattern, which fits the scene graph tree structure perfectly and<br />

makes it a highly flexible and reusable design.


Scene graphs also <strong>of</strong>ten come with additional utility libraries which range from<br />

helping users set up and manage graphics windows to importing <strong>of</strong> 3d models and<br />

images. A dozen lines <strong>of</strong> code can be enough to load our data and create an<br />

interactive viewer<br />

3)Portability<br />

Scene graphs encapsulate much <strong>of</strong> the lower level tasks <strong>of</strong> rendering graphics and<br />

reading and writing data, reducing or even eradicating the platform specific coding<br />

that you require in your own application. If the underlying scene graph is portable<br />

then moving from platform to platform can be as simple as recompiling your source<br />

code.<br />

4)Scalability<br />

Along with being able to dynamic manage the complexity <strong>of</strong> scenes automatically<br />

to account for differences in graphics performance across a range <strong>of</strong> machines,<br />

scene graphs also make it much easier to manage complex hardware configurations,<br />

such as clusters <strong>of</strong> graphics machines, or multiprocessor/multipipe systems. A good<br />

scene graph will allow the developer to concentrate on developing their own<br />

application while the rendering framework <strong>of</strong> the scene graph handles the different<br />

underlying hardware configurations.<br />

1.5 Related Background<br />

Image Based Rendering (IBR) has the potential to produce new views <strong>of</strong> a real<br />

scene with the realism impossible to achieve by other means. It aims to capture an<br />

environment using a number <strong>of</strong> cameras that recover the geometric and photometric<br />

structure from the scenes. The scene can be rendered from any viewpoint thereafter<br />

using the internal representations used. The representations used fall into two broad<br />

categories: those without any geometric model and those with geometric model <strong>of</strong><br />

some kind. Early IBR efforts produced new views <strong>of</strong> scenes given two or more<br />

images <strong>of</strong> it [5, 19]. Point-to-point correspondences contained all the structural<br />

information about the scene used by such methods. Many later techniques also used<br />

only the images for novel view generation[15, 11, 8, 20]. They require a large<br />

number <strong>of</strong> input views -- <strong>of</strong>ten running into thousands -- for modeling a scene<br />

satisfactorily. This makes them practically unusable other than for static scenes.<br />

The representation was also bulky and needs sophisticated compression schemes.<br />

The availability <strong>of</strong> even approximate geometry can reduce the requirements on the<br />

number <strong>of</strong> views drastically. The use <strong>of</strong> approximate geometry for view generation<br />

was a significant contribution <strong>of</strong> Lumigraph rendering [8] and in view-dependent<br />

texture mapping[6]. Unstructured Lumigraph[4] extend this idea to rendering using<br />

an unstructured collection <strong>of</strong> views and approximate models.<br />

The <strong>Depth</strong> Image (DI) representation is suitable for IBR as it can be computed<br />

from real world using cameras and can be used for new view generation. A <strong>Depth</strong><br />

Image consists <strong>of</strong> a pair <strong>of</strong> aligned maps: the image or texture map I that gives the<br />

colour <strong>of</strong> all visible points and a depth map D that gives the distance to each visible


point. The image and depth are computed with respect to a real camera in practice<br />

though this does not have to be the case. The calibration matrix C <strong>of</strong> the camera is<br />

also included in the representation giving the triplet (D, I, C). This is a popular<br />

representation for image-based modeling as cameras are cheap and methods like<br />

shape-from-X are mature enough to capture dense depth information. It has been<br />

used in different contexts [14, 16, 23, 13, 24]. The Virtualized Reality system<br />

captured dynamic scenes and modeled them for subsequent rendering using a studio<br />

with a few dozens <strong>of</strong> cameras [16]. Many similar systems have been built in recent<br />

years for modeling, immersion, videoconferencing, etc [22, 3]. Recently, a layered<br />

representation with full geometry recovery for modeling and rendering dynamic<br />

scenes has been reported by Zitnick et al [23]. Special scanners such as those made<br />

by CyberWare have also been used to capture such representations <strong>of</strong> objects and<br />

cultural assets like in the Digital Michelangelo project [2, 1].<br />

<strong>Depth</strong> Images have been used for IBR in the past. McMillan used it for warping<br />

[14] and Mark used an on-the-fly <strong>Depth</strong> Image for fast rendering <strong>of</strong> subsequent<br />

frames [13]. Virtualized Reality project computed them using multibaseline stereo<br />

and used them to render new views using warping and hole-filling [16]. Zitnick et al<br />

[24] use them in a similar way with an additional blending step to smooth<br />

discontinuities. Waschbusch et al [24] extended this representation to sparsely<br />

placed cameras and presented probabilistic rendering with view-independent pointbased<br />

representation <strong>of</strong> the depth information.<br />

The general framework <strong>of</strong> rendering <strong>Depth</strong> Images with blending was presented in<br />

[17]. [25] Provides a GPU based algorithm for real-time rendering <strong>of</strong> a<br />

representation consisting <strong>of</strong> multiple <strong>Depth</strong> Images. It present a study on the<br />

locality properties <strong>of</strong> <strong>Depth</strong> Image based rendering. Results on representative<br />

synthetic data sets are presented to demonstrate the utility <strong>of</strong> the representation and<br />

the effectiveness <strong>of</strong> the algorithms presented.


Chapter 2<br />

<strong>Depth</strong> +TR System: <strong>Representation</strong> and Rendering<br />

2.1 <strong>Representation</strong><br />

Basic representation consists <strong>of</strong> an image, its depth map and the calibration<br />

parameters. <strong>Depth</strong> map is a two dimensional array <strong>of</strong> real values with location (i,j)<br />

storing the depth distance to the point that projects pixel (i,j) in the image. Figure 5<br />

below shows the depth maps and images for synthetic and rea; scene. Closer points<br />

are shown brighter.<br />

(a) Synthetic scene<br />

Figure 5<br />

(b) Real Scene<br />

<strong>Depth</strong> and <strong>Texture</strong> are stored as images in disk. The depth map contains real values<br />

whose range depends upon the resolution <strong>of</strong> the structure recovery method. Images<br />

with 16 bits per pixel can store information up to 65 meters. Further sections<br />

explain how the maps are constructed and why we are using <strong>Depth</strong> + <strong>Texture</strong> for<br />

IBR.<br />

Construction <strong>of</strong> D+TR: The D+TR can be created using a suitable 3D structure<br />

recovery method, including stereo, range sensors, shape-from-shading, etc.<br />

Multicamera stereo remains the most viable option as cameras are inexpensive and<br />

nonintrusive. <strong>Depth</strong> and texture needs to be captured only from a few points <strong>of</strong> view<br />

since geometry can be interpolated. Calibrated, instrumented setup consisting <strong>of</strong> a<br />

dozen or so cameras can capture static or dynamic events as they happen. <strong>Depth</strong><br />

map can be computed for each camera using other cameras in its neighborhood and<br />

a suitable stereo program.


There are various reasons for selecting <strong>Depth</strong> and <strong>Texture</strong> for IBR. Rendering a<br />

scene using depth maps is a research area. <strong>Depth</strong> map gave visibility limited model<br />

<strong>of</strong> the scene and can be rendered easily using graphics algorithms. <strong>Texture</strong> mapping<br />

ensures photorealism.<br />

2.2 Rendering<br />

The rendering aspect <strong>of</strong> the D+TR system is detailed in [17]. We describe in this<br />

section the rendering approaches and related issue here. First, we describe rendering<br />

using one depth Map and then using multiple depth Maps.<br />

2.2.1 Rendering one D+TR<br />

For rendering one D+TR two approaches have been implemented and tested.<br />

Splatting: The point cloud can be splatted as point-features. Splatting techniques<br />

broaden the individual 3D points to fill the space between points. The color <strong>of</strong> the<br />

splatted points is obtained from the corresponding image pixel. Splatting has been<br />

used as the method for fast rendering, as point features are quick to render. The<br />

disadvantage <strong>of</strong> splatting is that holes can show up where data is missing if we<br />

zoom in much. Figure below shows one D+TR rendered using Splatting.<br />

Implied Triangulation: Every 2x2 section <strong>of</strong> the <strong>Depth</strong> map is converted to two<br />

triangles by drawing the diagonal. The depth discontinuities are handled by<br />

breaking all edges with large difference in the z-coordinate between its end points<br />

and removing the corresponding triangles from the model. Triangulation results in<br />

the interpolation <strong>of</strong> the interior points <strong>of</strong> the triangles, filling holes created due to<br />

the lack <strong>of</strong> resolution. The interpolation can produce low quality images if there is<br />

considerable gap in the resolutions <strong>of</strong> the captured and rendered views, such as<br />

when zooming in. This is a fundamental problem in image based rendering. Figure<br />

6 shows the D+TR rendered using the method <strong>of</strong> Triangulation.<br />

(a) D+TR rendered using splatting<br />

FIGURE 6<br />

(b) D+TR rendered using triangulation


2.2.2 Rendering Multiple D+TR<br />

The generated view using one D+TR will have holes or gaps corresponding to the<br />

part <strong>of</strong> the occluded scene being exposed in the new view position. These holes can<br />

be filled using another D+TR that sees those regions. Parts <strong>of</strong> the scene could be<br />

visible to multiple cameras. The views generated by multiple D+TRs have to be<br />

blended in such cases. Hole Filling and Blending <strong>of</strong> the views to generate the novel<br />

view is explained in next section. Figure 7 shows the result <strong>of</strong> rendering multiple<br />

D+TRs<br />

(a) D+TR rendered using splatting<br />

FIGURE 7<br />

(b) D+TR rendered using triangulation<br />

2.3 Blending on CPU<br />

The views used for novel view generation are blended to provide high<br />

quality output. Each valid input view is blended based on its depth and texture.<br />

Blend Function: Angular distance between the input views camera center and the<br />

novel view camera enter is used as the criteria for blending. This angular distance is<br />

used to calculate the weight given to each view for each pixel. Cosine weights are<br />

used in D+TR technique. Figure 8 below is used to explain the blending process.<br />

FIGURE 8: Blending is based on Angular distance


As shown in the figure, t 1 and t 2 are angular distance <strong>of</strong> the views c 1 and c 2 from D<br />

(novel view). So the corresponding pixel weight for both the views is given by<br />

following function.<br />

w c1 = cos n (t 1 ); (for view c 1 )<br />

w c2 = cos n (t 1 ); (for view c 1 )<br />

n > 2 gives suitable values for blending.<br />

Pixel weight can also be calculated using the exponential function. Exponential<br />

blending computes weights as w i = e cti where is the view index, t i is the angular<br />

distance <strong>of</strong> the view i, and w i is the weight for the view at that pixel. The constant c<br />

controls the fall <strong>of</strong>f as the angular distance increases.<br />

Pixel Blending: The blend function should result in smooth changes in the<br />

generated view as the viewpoint changes. Thus, the views that are close to the new<br />

view should get emphasized and views that are away from it should be<br />

deemphasized. Besides this z-buffering is also used so that the proper pixel are used<br />

to render the novel view. An example is explained below using figure 9.<br />

FIGURE 9: Process <strong>of</strong> Pixel Blending when one pixel is seen by multiple views<br />

As shown in the figure above view 1,2,3,4 are used in generating the novel view.<br />

Consider the ray from novel view which in the scene corresponds to both the pixel<br />

A and B. To render this pixel in the novel view the D+TR takes care <strong>of</strong> all the<br />

complexities, it is explained below.<br />

View 1 and 2 are used for pixel A although view 3 and 4 are also valid views to be<br />

used for rendering. This is simply because the pixel A seen by view 3 and 4 lies<br />

behind the pixel B seen by view 1 and 2 along the same ray from novel view. The<br />

red line shows that this view is discarded and green line indicates that the view is<br />

considered for blending.


View 3 sees both the pixel A and B in the scene which are along the ray from novel<br />

view, but the pixel in view 3 corresponding to pixel B is used in rendering the pixel<br />

in novel view.<br />

2.4 GPU Algorithm<br />

We devised a 2-pass algorithm to render multiple DIs with per-pixel blending. The<br />

first pass determines for each pixel which views need to be blended. The second<br />

pass actually blends them. The property <strong>of</strong> each pixel blending a different set <strong>of</strong> DIs<br />

is maintained by the new algorithm. The pseudo code <strong>of</strong> the algorithm, as presented<br />

in [25] is given below:<br />

2.5 Two Pass Algorithm<br />

The D+TR technique <strong>of</strong> IBR has the blending module to provide high<br />

quality novel views. The read back <strong>of</strong> the frame buffer is the time consuming<br />

operation in the above algorithm. The modern GPUs have a lot <strong>of</strong> computation<br />

power and memory in them. If the read back is avoided and the blending is done in<br />

the GPU, the frame rate can possibly reach interactive rates. The much required<br />

speed with the same original quality is achieved with the 2 pass algorithm. Pass 1


deal with vertex shader and pass 2 deals with pixel shader implementing the<br />

blending process.<br />

2.5.1 Pass One<br />

This pass generates the novel view depth. Each pixel is rendered with a shift <strong>of</strong><br />

‘Delta’ in the depth value. If ‘D’ is the depth for a pixel then it is rendered at ‘D-<br />

Delta’ In pass 2 all the pixels within the range <strong>of</strong> Delta are blended and other are<br />

rendered using the depth test. Pass 1 can be explained with the block diagram<br />

below.<br />

calibration parameters<br />

rendering<br />

3D co-ordinates are<br />

computed<br />

x,y,z<br />

Z -> Z - Delta<br />

Novel View <strong>Depth</strong><br />

depth map<br />

FIGURE 10: Process <strong>of</strong> Pass 1 and Rendered surface with pass 1<br />

As shown in the block diagram calibration parameters and the <strong>Depth</strong> values are passed to<br />

the vertex shader. Also, the rendered surface is little behind the actual surface. Each valid<br />

input view for the current novel view position is rendered one by one to get the novel<br />

view depth.<br />

2.5.2 Pass Two<br />

Blending is done in pixel shader. The process <strong>of</strong> blending is explained in next<br />

section. Pixel shader takes camera center <strong>of</strong> the novel view and the current view along<br />

with the textures and 3D location <strong>of</strong> the current pixel to implement this pass. The output<br />

for each view from the pixel shader is sent with next view to the pixel shader to perform<br />

the blending in a normalized way, which also increases the quality <strong>of</strong> output. Pass 2 is<br />

also explained using block diagram in figure 11. Pixel shader code is given at the end <strong>of</strong><br />

this report in the Appendix section along with vertex shader code.


<strong>Texture</strong><br />

output <strong>of</strong> the pixel shader<br />

is copied back to original<br />

texture<br />

Input View<br />

<strong>Texture</strong><br />

Pixel Shader<br />

for last input view<br />

output is sent to<br />

frame buffer<br />

Frame Buffer<br />

each valid input view<br />

texture is sent one by one<br />

FIGURE 11: Implementation <strong>of</strong> Pass 2<br />

Pass 2 is more time taking than pass 1 because the blending process is implemented<br />

in this pass and the weight calculation per pixel is done in pixel shader only. The<br />

number <strong>of</strong> parameters being passed to the pixel shader is also more which further<br />

increases the execution time. Pixel shader takes two textures, two camera centers<br />

(novel view and the input view camera) along with the 3D location <strong>of</strong> the pixel as<br />

input, whereas vertex shader just takes the 3D location and modelview and<br />

projection matrix as the input.<br />

2.6 Blending on GPU<br />

Concept <strong>of</strong> Blending is same on GPU as in CPU, but here the normalization is done<br />

with each incoming input view and in CPU before the blending process starts the<br />

normalized weights are there for each view.<br />

Blend Function: Blend function used is<br />

C f = (a s C s + a d C d )/ a f<br />

a f = a s + a d<br />

Where, C s is the color <strong>of</strong> source pixel (incoming view) and C d is the color <strong>of</strong> the<br />

destination pixel (rendered views before this). a s and a d are corresponding alpha<br />

values. With each incoming view the alpha value <strong>of</strong> the rendered view is<br />

normalized.<br />

C f and a f is the color and alpha value which is rendered and if the input view is not<br />

the last view, then both <strong>of</strong> them come as C d and a d along with the next valid input<br />

view. If the current input view is the last view, then the rendered value <strong>of</strong> C f and a f<br />

is the color and alpha value <strong>of</strong> this location <strong>of</strong> the pixel in the finally rendered novel<br />

view. Both the source and destination alpha values are added so that in division in<br />

the next pass with another input view, the values are normalized.


The alpha value <strong>of</strong> the source pixel is calculated by the pixel shader which becomes<br />

its weight. Using the block diagram below the process <strong>of</strong> calculation <strong>of</strong> alpha value<br />

is explained.<br />

Center <strong>of</strong> Novel View<br />

Center <strong>of</strong> Current View<br />

3D location <strong>of</strong> Pixel<br />

Computation<br />

<strong>of</strong> angle and<br />

weight in pixel<br />

shader<br />

a s<br />

FIGURE 12: Process <strong>of</strong> calculation <strong>of</strong> alpha value <strong>of</strong> source pixel in pixel shader<br />

Each pixel <strong>of</strong> the neighboring valid views used for novel view generation is used in<br />

the process <strong>of</strong> blending, hence weight for each one <strong>of</strong> them is calculated. This<br />

accumulated alpha value is normalized with each new view coming to pixel shader<br />

for blending by dividing by total accumulated alpha value.<br />

2.7 Pipeline <strong>of</strong> Accelerated D+TR<br />

The entire pipeline <strong>of</strong> the Accelerated D+TR is discussed and is explained in this<br />

section. From the figure 13, we can see that the input to the system is <strong>Depth</strong> map,<br />

calibration parameters and the input view texture. <strong>Depth</strong> map is input in the preprocessing<br />

step and pass 1. Calibration parameters are input in pre-processing step<br />

and second pass. Input view texture is the input in pass 2 only when the process <strong>of</strong><br />

blending is implemented.


FIGURE 13: Pipeline <strong>of</strong> accelerated D+TR<br />

From the figure above it is clearly explained that pre-processing step is used to<br />

generate the 3D location <strong>of</strong> pixels, pass 1 for shifting the depth for the novel view<br />

for each valid input view, and pass 2 for blending the pixels based on the depth<br />

rendered and valid input view texture.<br />

The input to the system is calibration parameters, depth maps, and texture images<br />

from certain number <strong>of</strong> views. Based on the novel view camera, firstly the valid<br />

views surrounding the novel view camera are estimated. These valid views are then<br />

used for generating the novel view. To determine the validity <strong>of</strong> the view, the<br />

average <strong>of</strong> the 3D coordinates <strong>of</strong> the scene is taken (which is origin (0, 0, 0) in our<br />

case) and then based on angles between the two cameras and the average center<br />

point, validity is estimated. Views beyond 90 degrees are not considered (as they<br />

will see other part <strong>of</strong> the scene). When less number <strong>of</strong> views is used then closest<br />

views are preferred.<br />

Once the valid views are known, then using the calibration parameter and the <strong>Depth</strong><br />

map <strong>of</strong> each valid view, its 3D location is generated and is passed as texture to the<br />

vertex shader (or is used by vertex shader somehow) for generation <strong>of</strong> novel view<br />

depth. Vertex shader simply shifts the z coordinate a little behind so that the range<br />

within which shift is there, pixels can be blended in second pass. In certain<br />

approaches vertex shader also performs the 3D coordinate estimation, but it not the


optimal solution. On generating the shifted coordinate system, pass 1 passes this<br />

information to pass2.<br />

Pass 2 performs on the fly blending. This is performed by using a pixel shader that<br />

runs on GPU. For each pixel, the shader has access to the novel view and DI<br />

parameters and the results <strong>of</strong> previous rendering using a Frame Buffer Object<br />

(FBO). Depending on which DIs had values near the minimum z for each pixel, a<br />

different combination <strong>of</strong> DIs can be blended at each pixel. The colour values and<br />

alpha values are kept correct always so there is no post-processing step that depends<br />

on the number <strong>of</strong> DIs blended. The algorithm also ensures there will be no<br />

exceeding <strong>of</strong> the maximum range <strong>of</strong> colour values that is a possibility if the<br />

summing is done in the loop followed by a division at the end.<br />

2.8 Results<br />

The FPS <strong>of</strong> the system was calculated by varying the resolution and number <strong>of</strong><br />

input DIs. Below are the tables for the same. This experiment was conducted on an<br />

AMD64 Processor with 1GB RAM and nVidia 6600GT graphics card with 128MB<br />

RAM.<br />

Number <strong>of</strong> Input Views= 18 Number <strong>of</strong> Input Views= 9<br />

Number Of FPS<br />

views (Resolution<br />

=2)<br />

2 to 3 75 21<br />

3 to 5 46 13<br />

4 to 6 38 10.5<br />

7 to 8 21.3 6<br />

8 to 9 20 5<br />

10 to 12 14 3.2<br />

13 to 14 11 2.7<br />

FPS<br />

(Resolution<br />

=1)<br />

Number Of<br />

views<br />

FPS<br />

(Resolution<br />

=2)<br />

2 to 3 210 78<br />

3 to 5 130 46.5<br />

4 to 6 109 40<br />

7 to 8 40 14<br />

8 to 9 38 13.3<br />

9 32 13<br />

FPS<br />

(Resolution<br />

=1)<br />

As clear from the table above, our system is capable <strong>of</strong> producing novel view in real<br />

time.


Chapter 3<br />

D+TR & OpenSceneGraph<br />

3.1 OpenSceneGraph<br />

Open Scene Graph also commonly known as OSG is an Open Source, cross<br />

platform graphics toolkit for the development <strong>of</strong> high performance graphics<br />

applications such as flight simulators, games, virtual reality and scientific<br />

visualization. OSG is based around the concept <strong>of</strong> a Scene-Graph, it provides an<br />

object oriented framework on top <strong>of</strong> OpenGL freeing the developer from<br />

implementing and optimizing low level graphics calls, and provides many<br />

additional utilities for rapid development <strong>of</strong> graphics applications.<br />

It is a 3D graphics library for C++ programmers. A SceneGraph library allows us<br />

to represent objects in a scene with a graph data structure which allows us to group<br />

related objects that share some properties together so we can specify common<br />

properties for the whole group in one place. OSG can then be used to automatically<br />

manage things like the level <strong>of</strong> detail, culling, bounding shapes etc necessary to<br />

draw the scene faithfully but without unnecessary detail (which slows down the<br />

graphics hardware drawing the scene).<br />

The OpenSceneGraph project was started in 1998 by Don Burns as means <strong>of</strong><br />

porting a hang gliding simulator written on top <strong>of</strong> the Performer scene. The source<br />

code was open sourced in 1999 and porting <strong>of</strong> the scene graph element to windows<br />

was carried on by Robert Osfield. The project was made scalable in 2003 and the<br />

Version 1.0 OpenSceneGraph 1.0 release (which is the culmination <strong>of</strong> 6 years work<br />

by the lead developers and the open-source community that has grown up around<br />

the project) happened in 2006. The OSG we know now, is a cross platform,<br />

scalable, real time, open source scene graph that has over 1000 active developers<br />

world wide and users such as NASA, European Space Agency, Boeing, Magic<br />

Earth, American Army, and many others. Enabling the rapid development <strong>of</strong><br />

custom visualization programs, the OSG is also the power behind various projects<br />

like OsgVolume, Present3D etc.<br />

Unfortunately no there is currently no real Reference manuals or Programmers<br />

guides for Open Scene Graph. The recommendations on the OSG web site it to<br />

"Use the Source". Having the source assumes you readily understand how it all<br />

works or that you can deduce this from the code; this is not true for many who are<br />

new to OSG or to the simulation world and can be seen by many <strong>of</strong> the question on<br />

the mailing lists. While OSG has documents generated from headers and source, a<br />

lot <strong>of</strong> material found there is does not have context and can thus be difficult to<br />

assimilate, again a good programmers guide and reference manual can give the<br />

require context


3.1.1 “What is a scene graph?”<br />

As the name suggests, a scene graph is data structure used to organize a<br />

scene in a Computer Graphics application. The basic idea behind a scene graph is<br />

that a scene is usually decomposed in several different parts, and somehow these<br />

parts have to be tied together. So, a scene graph is a graph where every node<br />

represents one <strong>of</strong> the parts in which a scene can be divided. Being a little more<br />

strict, a scene graph is a directed acyclic graph, so it establishes a hierarchical<br />

relationship among the nodes.<br />

In this section, we describe a simple scene graph and introduce some basic OSG<br />

node types.Suppose we want to render a scene consisting <strong>of</strong> a road and a truck.<br />

A scene graph representing this scene is depicted in Figure 14<br />

Figure 14: A scene graph, consisting <strong>of</strong> a road and a truck.<br />

If we render this scene just like this, the truck will not appear on the place you want.<br />

We’ll have to translate it to its right position. Fortunately, scene graph nodes don’t<br />

always represent geometry. In this case, we can add a node representing a<br />

translation, yielding the scene graph shown on Figure 15.<br />

Figure 15: A scene graph, consisting <strong>of</strong> a road and a translated truck.<br />

Let’s add two boxes to the scene, one on the truck, the other one on the road. Both<br />

boxes will have translation nodes above them, so that they can be placed at their<br />

proper locations. Furthermore, the box on the truck will also be translated by the<br />

truck translation, so that if we move the truck, the box will move, too since both


oxes look exactly the same, we don’t have to create a node for each one <strong>of</strong> them.<br />

One node “referenced” twice does the trick, as Figure 16 illustrates. During<br />

rendering, the “Box” node will be visited (and rendered) twice, but some memory is<br />

spared because the model is loaded just once. This is one <strong>of</strong> the reason a scene<br />

graph is a “Graph” and not a “tree”.<br />

Figure 16 : A scene graph, consisting <strong>of</strong> a road, a truck and a pair <strong>of</strong> boxes.<br />

Up to this point, the discussion was around “generic” scene graphs. From now on,<br />

we will use exclusively OSG scene graphs, that is, instead <strong>of</strong> using a generic<br />

“Translation” node, we’ll be using an instance <strong>of</strong> a real class defined in the OSG<br />

hierarchy.<br />

A node in OSG is represented by the osg::Node class. Renderable things in OSG are<br />

represented by instances <strong>of</strong> the osg::Drawable class. But osg::Drawables are not<br />

nodes, so we cannot attach them directly to a scene graph. It is necessary to use a<br />

“geometry node”, osg::Geode, instead. Not every node in an OSG scene graph can<br />

have other nodes attached to them as children. In fact, we can only add children to<br />

nodes that are instances <strong>of</strong> osg::Group or one <strong>of</strong> its subclasses.<br />

Using osg::Geodes and an osg::Group, it is possible to recreate the scene graph<br />

from Figure 14 using real classes from OSG. The result is shown in Figure17.


Figure 17: An OSG scene graph, consisting <strong>of</strong> a road and a truck. Instances <strong>of</strong> OSG<br />

classes derived from osg::Node are drawn in rounded boxes with the class name<br />

inside it. osg::Drawables are represented as rectangles.<br />

That’s not the only way to translate the scene graph from Figure 14 to a real OSG<br />

scene graph. More than one osg::Drawable can be attached to a single osg::Geode,<br />

so that the scene graph depicted in Figure 18 is also an OSG version <strong>of</strong> Figure 14.<br />

Figure 18: An alternative OSG scene graph representing the same scene as the one<br />

in Figure 17.<br />

The scene graphs <strong>of</strong> Figures 17 and 18 has the same problem as the one in the<br />

Figure 14: the truck will probably be at the wrong position. And the solution is the<br />

same as before: translating the truck. In OSG, probably the simplest way to translate<br />

a node is by adding an osg::PositionAttitudeTransform node above it. An<br />

osg::PositionAttitudeTransform has associated to it not only a translation, but also<br />

an attitude and a scale. Although not exactly the same thing, this can be thought as<br />

the OSG equivalent to the OpenGL calls glTranslate(), glRotate() and glScale().<br />

Figure 19 is the OSGfied version <strong>of</strong> Figure 15.<br />

Figure 19: An OSG scene graph, consisting <strong>of</strong> a road and a translated truck. For<br />

compactness reasons, osg::PositionAttitudeTransform is written as osg::PAT.


3.1.2 Nodes in OSG<br />

As stated in previous section, OSG comprises <strong>of</strong> various kinds <strong>of</strong> nodes for<br />

representing specific information <strong>of</strong> a complex 3d scene. Notable among these are :<br />

1) osg::Node – This is the base class <strong>of</strong> all internal nodes classes. This class is used<br />

very rarely in a scene graph but it has important members like Bounding Sphere,<br />

parentlist, NodeCallbacks (for update, event Traversal and cullCallback , more on<br />

callbacks later).<br />

2)osg::Group - An example <strong>of</strong> the Composite Design Pattern, this class is derived<br />

from osg::Node. It provides functionality for adding children nodes and maintains a<br />

list <strong>of</strong> children nodes.<br />

3)osg::Transform – This is a group node for which all children are transformed by a<br />

4x4 matrix. It is <strong>of</strong>ten used for positioning objects within a scene, producing<br />

trackball functionality or for animation.<br />

Transform itself does not provide set/get functions, only the interface for defining<br />

what the 4x4 transformation is. Subclasses, such as MatrixTransform and<br />

PositionAttitudeTransform support the use <strong>of</strong> an osg::Matrix or a<br />

osg::Vec3/osg::Quat respectively.<br />

osg::MatrixTransform - uses a 4x4 matrix for the transform<br />

osg::PositionAttitudeTransform - uses a Vec3 position, Quat rotation for the<br />

attitude, and Vec3 for a pivot point.<br />

4)osg::Geode - A Geode is a "geometry node", that is, a leaf node on the scene<br />

graph that can have "renderable things" attached to it. In OSG, renderable things<br />

are represented by objects from the Drawable class, so a Geode is a Node whose<br />

purpose is grouping Drawables. It maintains a list <strong>of</strong> “Drawables”.<br />

5) osg::Drawable - A pure virtual class (with 6 concrete derived classes) which<br />

provides all the import draw*() methods. In OSG, everything that can be rendered is<br />

implemented as a class derived from Drawable. The Drawable class contains no<br />

drawing primitives, since these are provided by subclasses such as osg::Geometry.<br />

Also, note that a Drawable is not a Node, and therefore it cannot be directly added<br />

to a scene graph. Instead, Drawables are attached to Geodes, which are scene graph<br />

nodes.<br />

This class contains a stateset and a list <strong>of</strong> parents along with cull and draw<br />

callbacks.The OpenGL state that must be used when rendering a Drawable is<br />

represented by a StateSet. These StateSets can be shared between drawables which<br />

proves to be a good way to improve performance, since this allows OSG to reduce<br />

the number <strong>of</strong> expensive changes in the OpenGL state. Like StateSets, Drawables<br />

can also be shared between different Geodes, so that the same geometry (loaded to<br />

memory just once) can be used in different parts <strong>of</strong> the scene graph.


FIGURE 20: Inheritance diagram for the osg::Drawable class.<br />

The major classes derived from this base class are:<br />

osg::Geometry : This class adds real geometry to the scene graph and can have<br />

vertex (and vertex data) associated with it directly, or can have any number <strong>of</strong><br />

'primitiveSet' instances associated with it. Vertex and vertex attribute data (color,<br />

normals, texture coordinates) is stored in arrays. Since more than one vertex may<br />

share the same color, normal or texture coordinate, and array <strong>of</strong> indices can be used<br />

to map vertex arrays to color, normal or texture coordinate arrays.<br />

osg::ShapeDrawable : It adds the ability to render the shape primitives, so that they<br />

can be rendered with reduced effort. Various shape primitives are: Box, Cone,<br />

Cylinder, Sphere, Triangle Mesh etc. ShapeDrawable currently doesn't render<br />

InfinitePlanes.<br />

6)osg::StateSet - Stores a set <strong>of</strong> modes and attributes which respresent a set <strong>of</strong><br />

OpenGL state. Notice that a StateSet contains just a subset <strong>of</strong> the whole OpenGL<br />

state. In OSG, each Drawable and each Node has a reference to a StateSet. These<br />

StateSets can be shared between different Drawables and Nodes (that is, several<br />

Drawables and Node s can reference the same StateSet). Indeed, this practice is<br />

recommended whenever possible, as this minimizes expensive state changes in the<br />

graphics pipeline. This state include textureModeList, textureAttributeList,<br />

attributeList, modeList etc along with updateCallback and eventCallback.<br />

All the nodes described above are part <strong>of</strong> the core module <strong>of</strong> OSG called osg. There<br />

are various other modules in osg like osgDB (plugin support library for managing<br />

the dynamic plugins - both loaders and NodeKits ), osgGA( GUI adapter library -<br />

to assist development <strong>of</strong> viewers), osgGLUT (GLUT viewer base class ) ,<br />

osgPlugins (28 plugins for reading and writing images and 3d databases) etc. Some<br />

<strong>of</strong> these will be discussed later in this report.<br />

3.1.3 Structure <strong>of</strong> Scene graph<br />

Having described major node types in OSG, let us discuss a typical scene hierarchy.<br />

The graph will have osg::Group at the top (representing the whole scene),<br />

osg::Groups, LOD's, Transform, Switches in the middle(dividing the scene in to


various logical units) and osg::Geode(Geometry Nodes containing osg::Drawables<br />

and osg::StateSets) as the leaf nodes.<br />

3.1.4 Windowing System in OSG<br />

Just like OpenGL, the core <strong>of</strong> OSG is independent <strong>of</strong> windowing system. The<br />

integration between OSG and some windowing system is delegated to other, noncore<br />

parts <strong>of</strong> OSG (users are also allowed to integrate OSG with any exotic<br />

windowing system they happen to use). Viewer implements the integration between<br />

OSG and Producer, AKA Open Producer thus <strong>of</strong>fering an out-<strong>of</strong>-the-box, scalable<br />

and multi-platform abstraction <strong>of</strong> the windowing system.<br />

3.1.5 Skeleton OSG Code<br />

FIGURE 21: Inheritance diagram for osgProducer::Viewer class<br />

This section describes various steps to setup a simple OSG program using the<br />

Nodes and the windowing system discussed in above sections. Given below are the<br />

steps to follow:<br />

1) Setup the viewer (osgProducer::Viewer instance)<br />

2) Create the scene graph for the scene(using various nodes like<br />

Geode,Geometry etc).<br />

3) Attach the viewer and graph using the setSceneData() method.<br />

4) Start the Simulation Loop which generates the scene :<br />

While(!viewer.done){<br />

/* wait for all cull and draw threads to complete*/<br />

viewer.sync();<br />

/*update the scene by traversing it with the the update visitor which will call all<br />

node update callbacks and animations. */<br />

viewer.update();


* fire <strong>of</strong>f the cull and draw traversals <strong>of</strong> the scene.*/<br />

}<br />

viewer.cull();<br />

3.1.6 Callbacks<br />

Users can interact with a scene graph using callbacks. Callbacks can be thought <strong>of</strong><br />

as user-defined functions that are automatically executed depending on the type <strong>of</strong><br />

traversal (update, cull, draw) being performed. Callbacks can be associated with<br />

individual nodes or they can be associated with selected types (or subtypes) <strong>of</strong><br />

nodes. During each traversal <strong>of</strong> a scene graph if a node is encountered that has a<br />

user-defined callback associated with it, that callback is executed.<br />

FIGURE 22: Callback Mechanism<br />

Code that takes advantage <strong>of</strong> callbacks can also be more efficient when a<br />

multithreaded processing mode is used. The code associated with update callbacks<br />

happens once per frame before the cull traversal. One way would be to insert the<br />

code in the main simulation loop between the viewer.update() and viewer.frame()<br />

calls. However, callbacks provide an interface that is easier to update and maintain.<br />

3.1.7 osgGA::GUIEventHandler<br />

The GUIEventHandler class provideds developers with an interface to the<br />

windowing sytem's GUI events. The event handler recieves updates in the form <strong>of</strong><br />

GUIEventAdapter instances. The event handler can also send requests for the GUI<br />

system to perform some operation using GUIActionAdapter instances.<br />

Information about GUIEventAdapters instances include the type <strong>of</strong> event (PUSH,<br />

RELEASE, DOUBLECLICK, DRAG, MOVE, KEYDOWN, KEYUP, FRAME,<br />

RESIZE, SCROLLUP, SCROLLDOWN, SCROLLLEFT). Depending on the type<br />

<strong>of</strong> GUIEventAdapter, the instance may have additional information associated with<br />

it.


The GUIEventHandler uses GUIActionAdapters to request actions <strong>of</strong> the GUI<br />

system. It interacts with the GUI primarily with the 'handle' method. The handle<br />

method has two arguments: an instance <strong>of</strong> GUIEventAdapter for receiving updates<br />

from the GUI, and a GUIActionAdapter for requesting actions <strong>of</strong> the GUI. The<br />

handle method can examine the type and values associated with the<br />

GUIEventAdapter, perform required operations, and make a request <strong>of</strong> the GUI<br />

system using the GUIActionAdapter. The handle method returns boolean variable<br />

set to true if the event has been 'handled', false otherwise.<br />

3.2 D+TR in OSG<br />

In this section, we describe the specifications <strong>of</strong> our D+TR system ported in OSG.<br />

3.2.1 <strong>Representation</strong><br />

Basic <strong>Representation</strong> in OSG consists <strong>of</strong> a special kind <strong>of</strong> node called<br />

<strong>Depth</strong>TR, which is derived from Geode class. This <strong>Depth</strong>TR node can store the<br />

geometry <strong>of</strong> the scene and has a special class called dtrDrawable (inherited from<br />

osg::Drawable) attached to it (more on the dtrDrawable later in this section).<br />

There is another class called InputView which contains <strong>Depth</strong> Images (DIs) and<br />

calibration parameters which were the inputs to our D+TR system. <strong>Depth</strong> map is a<br />

two dimensional array <strong>of</strong> real values with location (i,j) storing the depth distance to<br />

the point that projects pixel (i,j) in the image. Closer points are shown brighter.<br />

<strong>Depth</strong> and <strong>Texture</strong> are stored as images in disk. The depth map contains real values<br />

whose range depends upon the resolution <strong>of</strong> the structure recovery method. Images<br />

with 16 bits per pixel can store information up to 65 meters. <strong>Depth</strong>TR class contains<br />

a pointer to the array <strong>of</strong> InputViews.<br />

Class <strong>Depth</strong>TR has several important functions like load() (loads all input texture<br />

and depth maps), projectView() (projects input view to the novel view orientation),<br />

setNovelView (to set the validity flag for each input view), getNovelView() (returns<br />

novel view generated from the input textures and depth maps) etc. Class<br />

dtrDrawable has a function called drawImplementation() which basically is used<br />

for rendering the <strong>Depth</strong>TR node in accordance to the <strong>Depth</strong>+TR algorithm<br />

presented in previous chapters. Class InputView has a function for loading a<br />

depthMap, calculating the 3D from the image using get3D() and a function for<br />

projecting the view in novelView direction.<br />

<strong>Depth</strong>TR<br />

dtrDrawable<br />

InputView<br />

InputView * inputViews<br />

valid<br />

numberOfInputViews<br />

<strong>Depth</strong>TR* _ss calibration, cameraCenter,<br />

class dtrDrawable<br />

imageFile, depthFile<br />

load()<br />

drawImplementation(o load()<br />

projectView(…)<br />

sg::State& state) get3D()<br />

cosAngleBlending(…)<br />

projectView(…)<br />

FIGURE 23: Class Diagrams <strong>of</strong> D+TR in OSG


3.2.2 Rendering<br />

We have implemented Implied Triangulation approach for rendering in<br />

OSG. We have already described this approach in the previous chapter. In this<br />

section, we further describe how the Rendering occurs in OSG.<br />

3.2.2.1 Rendering in OSG<br />

OSG provides an excellent framework for maximizing graphics performance. Any<br />

scene graph employs three key phases while generating a 3D scene. These are App,<br />

Cull, Draw phases. In App phase, the graphics application sets up the scene graph<br />

and the parameters necessary for rendering. In Cull phase, culling <strong>of</strong> the objects that<br />

won’t appear on the screen is done. The hierarchical structure <strong>of</strong> scene graph<br />

enables efficient culling. And, finally in the Draw phase, the scene is actually drawn<br />

on the screen.For further optimization, these three phases are carried out by<br />

different threads simultaneously as shown in Figure 24 below.<br />

FIGURE 24: Parallelism in OSG<br />

During the Cull phase, the whole scene graph is traversed by NodeVisitor and the<br />

visibility <strong>of</strong> each node is determined. The scene graph is constructed in such a way<br />

that the geometry nodes (Geode) lie at the bottom <strong>of</strong> the graph (leaf nodes). Each<br />

Geode further contains a list <strong>of</strong> Drawables (Geometry, ShapeDrawable, Text etc)<br />

which can be drawn. During the ‘draw’ phase, the scene graph is again traversed by<br />

NodeVisitor, which calls a virtual function called<br />

‘drawImplementation(osg::State&)’ while rendering the Drawables.<br />

drawImplementation(State&) is a pure virtual method for the actual implementation<br />

<strong>of</strong> OpenGL drawing calls, such as vertex arrays and primitives, that must be<br />

implemented in concrete subclasses <strong>of</strong> the Drawable base class, examples include<br />

osg::Geometry and osg::ShapeDrawable. drawImplementation(State&) is called<br />

from the draw(State&) method, with the draw method handling management <strong>of</strong><br />

OpenGL display lists, and drawImplementation(State&) handling the actual<br />

drawing itself.


The code above is extracted from the source <strong>of</strong> OSG. As mentioned earlier,our<br />

<strong>Depth</strong>TR class is derived from the Geode class. We add dtrDrawable to it for<br />

drawing our IBR scene. The ‘drawImplementation(osg::State&)’ function <strong>of</strong><br />

dtrDrawable class is overridden to render <strong>Depth</strong> Maps using the D+TR algorithm.<br />

3.2.3 Discussion<br />

This section describes the rendering algorithm implemented in<br />

‘drawImplementation’ method <strong>of</strong> dtrDrawable class from both OSG and D+TR<br />

point <strong>of</strong> view.<br />

During each ‘drawImplementation’ call to dtrDrawable, all inputViews are<br />

projected to the novelView using projectView() function. This is followed by<br />

reading the projected images using glReadPixels() and blending the views which<br />

are set to be valid (by setNovelView() function). Then, the angular blending is<br />

carried out and the final novelImage is drawn to the framebuffer using<br />

glWritePixels().<br />

Any keyboard event invokes the GUIEventHandler function which can be used to<br />

set novelView Parameters by calling NextView() function (this function set the<br />

novelView direction and is an auxillary function used in our system). This is<br />

followed by a call to ‘drawImplementation’ function which renders the novelImage<br />

on to the window screen.


To summarize the whole process, given n depth images and a particular novel view<br />

point, first we find the depth images on the same side <strong>of</strong> the novel view point.<br />

<strong>Depth</strong> images are considered to be on same side, when the angle subtended at the<br />

center <strong>of</strong> scene with depth image's camera center and novel view camera center is<br />

less than particular angle (threshold). Novel views are generated using these depth<br />

images. Now for each pixel Ô in the novel view, we compare the z. -values across<br />

all these novel views and keep the nearest z. -values within threshold. Weights are<br />

computed using blending for each pixel Ô as described earlier. The complete<br />

rendering algorithm is given in Algorithm above. Flow chart for this algorithm is<br />

given in Figure 25 below.<br />

FIGURE 25: Flow chart for complete rendering


3.3 Conclusions and Results<br />

In this chapter we have described the basic OSG concepts and how the D+TR<br />

system is implemented in OSG. We have also described the class structure and the<br />

rendering process <strong>of</strong> our system in great detail.<br />

Here are some <strong>of</strong> the snapshots <strong>of</strong> our system:<br />

The results will include the FPS figures for the table data in out system and are<br />

summarized in the table below:<br />

Application Time in seconds per frame<br />

D+TR on OSG<br />

References:<br />

[1] http://graphics.stanford.edu/projects/mich/.<br />

[2] http://www.cyberware.com.<br />

[3] H. Baker, D. Tanguay, I. Sobel, M. E. G. Dan Gelb, W. B. Culbertson, and T.<br />

Malzbender. The Coliseum Immersive Teleconferencing System. In <strong>International</strong> Workshop<br />

on Immersive Telepresence (ITP2002), 2002.<br />

[4] C. Buehler, M. Bosse, L. McMillan, S. J. Gortler, and M. F. Cohen. Unstructured<br />

Lumigraph Rendering. In SIGGRAPH, 2001.<br />

[5] S. Chen and L. Williams. View Interpolation for Image Synthesis. In SIGGRAPH, 1993.<br />

[6] P. E. Debevec, C. J. Taylor, and J. Malik. Modeling and Rendering Architecture from<br />

Photographs: A Hybrid Geometry and Image-Based Approach. In SIGGRAPH, 1996.<br />

[7] B. Girod, C.-L. Chang, P. Ramanathan, and X. Zhu. Light Field Compresion Using<br />

Disparity-Compensated Lifting. In ICASSP, 2003.<br />

[8] S. J. Gortler, R. Grzeszczuk, R. Szeliski, and M. F.Cohen. The Lumigraph. In<br />

SIGGRAPH, 1996.<br />

[9] I. Ihm, S. Park, and R. K. Lee. Rendering <strong>of</strong> spherical light fields. In Pacific Graphics,<br />

1997.<br />

[10] R. Krishnamurthy, B.-B. Chai, H. Tao, and S. Sethuraman. Compression and<br />

Transmission <strong>of</strong> <strong>Depth</strong> Maps for Image-Based Rendering. In <strong>International</strong> Conference on<br />

Image Processing, 2001.<br />

[11] M. Levoy and P. Hanrahan. Light Field Rendering. In SIGGRAPH, 1996.


[12] M. Magnor, P. Eisert, and B. Girod. Multi-view image coding with depth maps and 3-d<br />

geometry for prediction. Proc. SPIE Visual Communication and Image Processing (VCIP-<br />

2001), San Jose, USA, pages 263{271, Jan. 2001.<br />

[13] W. R. Mark. Post-Rendering 3D Image Warping: Visibility, Reconstruction, and<br />

Performance for <strong>Depth</strong>Image Warping. PhD thesis, University <strong>of</strong> North Carolina, 1999.<br />

[14] L. McMillan. An Image-Based Approach to Three Dimensional Computer Graphics.<br />

PhD thesis, University <strong>of</strong> North Carolina, 1997.<br />

[15] L. McMillan and G. Bishop. Plenoptic Modelling: An Image-Based Rendering<br />

Algorithm. In SIGGRAPH, 1995.<br />

[16] P. J. Narayanan, P. W. Rander, and T. Kanade. Constructing Virtual Worlds Using<br />

Dense Stereo. In Proc <strong>of</strong> the <strong>International</strong> Conference on Computer Vision,Jan 1998.<br />

[17] P. J. Narayanan, Sashi Kumar P, and Sireesh Reddy K. <strong>Depth</strong>+<strong>Texture</strong> <strong>Representation</strong><br />

for Image Based Rendering. In ICVGIP, 2004.<br />

[18] Sashi Kumar Penta and P. J. Narayanan. Compression <strong>of</strong> Multiple <strong>Depth</strong>-Maps for<br />

IBR. In Paci_c Graphics, 2005.<br />

[19] S. M. Seitz and C. R. Dyer. View Morphing. In SIGGRAPH, 1996.<br />

[20] H.-Y. Shum and L.-W. He. Rendering with concentric mosaics. In SIGGRAPH, 1999.<br />

[21] X. Tong and R. M. Gray. Coding <strong>of</strong> multi-view images for immersive viewing. In<br />

ICASSP, 2000.<br />

[22] H. Towles, W.-C. Chen, R. Yang, S.-U. Kam, and H. Fuchs. 3D Tele-Collaboration<br />

Over Internet2. In <strong>International</strong> Workshop on Immersive Telepresence (ITP2002), 2002.<br />

[23] C. L. Zitnick, S. B. Kang, M. Uyttendaele, S. Winder, and R. Szeliski. High-quality<br />

video view interpolation using a layered representation. In SIGGRAPH, 2004.<br />

[24] Michael Waschbusch, S. Wurmlin, D. Cotting, F. Sadlo, and M. Gross. Scalable 3D<br />

Video <strong>of</strong> Dynamic Scenes. In The Visual Computer, 2005.<br />

[25] Pooja Verlani, Aditi Goswami, P. J. Narayanan, Shekhar Dwivedi and Sashi Kumar<br />

Penta. <strong>Depth</strong> Images: <strong>Representation</strong>s and Real-time Rendering. In Third <strong>International</strong><br />

Symposium on 3D Data Processing, Visualization and Transmission (3DPVT), 2006<br />

Appendix<br />

I. Features and Advantages <strong>of</strong> OpenSceneGraph<br />

The stated goal <strong>of</strong> Open Scene Graph is to make the benefits <strong>of</strong> scene graph<br />

technology freely available to all, both commercial and non commercial users. OSG<br />

is written entirely in Standard C++ and OpenGL, it makes full use <strong>of</strong> the STL and<br />

Design Patterns, and leverages the open source development model to provide a<br />

development library that is legacy free and focused on the needs <strong>of</strong> end users.<br />

The stated key strengths <strong>of</strong> Open Scene Graph are its performance, scalability,<br />

portability and the productivity gains associated with using a fully featured scene<br />

graph, in more detail:<br />

Performance


Supports view frustum culling, occlusion culling, small feature culling, Level Of<br />

Detail (LOD) nodes, state sorting, vertex arrays and display lists as part <strong>of</strong> the core<br />

scene graph. These together make the Open Scene Graph one <strong>of</strong> the highest<br />

performance scene graph available.<br />

The Open Scene Graph also supports easy customization <strong>of</strong> the drawing process,<br />

such as implementation <strong>of</strong> Continuous Level <strong>of</strong> Detail (CLOD) meshes on top <strong>of</strong><br />

the scene graph (see Virtual Terrain Projection and Demeter).<br />

Productivity<br />

The core scene graph encapsulates the majority <strong>of</strong> OpenGL functionality including<br />

the latest extensions, provides rendering optimizations such as culling and sorting,<br />

and a whole set <strong>of</strong> add on libraries which make it possible to develop high<br />

performance graphics applications very rapidly. The application developer is freed<br />

to concentrate on content and how that content is controlled rather than low level<br />

coding.<br />

FormatSupport: OSG now states that it includes 45 separate plugin's for loading<br />

various 3D database and image formats. 3D database loaders include OpenFlight<br />

(.flt),TerraPage (.txp) including multi-threaded paging support , LightWave (.lwo),<br />

Alias Wavefront (.obj) , Carbon Graphics GEO (.geo) ,3D Studio MAX (.3ds)<br />

,Peformer (.pfb) ,Quake Character Models (.md2) , Direct X (.x) ,Inventor Ascii 2.0<br />

(.iv) ,VRML 1.0 (.wrl) ,Designer ,Workshop (.dw) ,AC3D (.ac) ,.osg Native<br />

OSG ASCII format , .osg Native OSG banary Format<br />

Image loaders include .rgb ,.gif, ,.jpg ,.png ,.tiff ,.pic ,.bmp ,.dds ,.tga , quicktime<br />

(under OSX),Fonts (via the freetype plugin)<br />

Node Kits<br />

OSG also has a set <strong>of</strong> Node Kits which are separate libraries that can be compiled in<br />

with your applications or loaded in at runtime, which add support for particle<br />

systems (osgParticle) ,high quality anti-aliased text (osgText) ,special effects<br />

framework (osgFX) ,OpenGL shader language support (osgGL2) ,large scale<br />

geospatial terrain database generation (osgTerrain) ,navigational light points<br />

(osgSim) ,osgNV ( support for NVidia's vertex, fragment, combiner, Cg shaders )<br />

,Demeter (CLOD terrain + integration with OSG) ,osgCal (which integrates Cal3D<br />

and the OSG) ,osgVortex (which integrates the CM-Labs Vortex physics enginer<br />

with OSG).<br />

Portability<br />

The core scene graph has been designed to have minimal dependency on any<br />

specific platform, requiring little more than Standard C++ and OpenGL. This has


allowed the scene graph to be rapidly ported to a wide range <strong>of</strong> platforms -<br />

originally developed on IRIX, then ported to: Irix , Linux ,Windows ,FreeBSD.<br />

Window Systems<br />

The core OSG library is completely windowing system independent, which makes<br />

it easy for users to add their own window-specific libraries and applications on top.<br />

In the distribution thereis already the osgProducer library which integrates with<br />

OpenProducer, and in the Community/Applications section <strong>of</strong> this website one can<br />

find examples <strong>of</strong> applications and libraries written on top <strong>of</strong> GLUT, Qt, MFC,<br />

WxWindows and SDL. Users have also integrated it with Motif, and X.<br />

Scalability<br />

OSG will not only run on portables all the way up to Onyx Infinite Reality<br />

Monsters, but also supports the multiple graphics subsystems found on machines<br />

like a multi-pipe Onyx<br />

II<br />

Vertex Shader and Pixel Shader Code<br />

Vertex Shader:<br />

struct appdata<br />

{<br />

float4 position : POSITION;<br />

float3 texpos: TEXCOORD0;<br />

float3 pointpos: COLOR;<br />

};<br />

struct vs2ps<br />

{<br />

float4 currpos2 : POSITION;<br />

float4 currpos : TEXCOORD1;<br />

float3 texpos: TEXCOORD0;<br />

float3 pointpos: COLOR;


};<br />

vs2ps main(appdata IN, uniform float4x4 modelMatProj )<br />

{<br />

vs2ps OUT;<br />

OUT.currpos2 = OUT.currpos = mul(modelMatProj, IN.position);<br />

OUT.texpos=IN.texpos;<br />

OUT.pointpos=IN.pointpos;<br />

}<br />

return OUT;<br />

Pixel Shader:<br />

struct vpixel_out {<br />

float4 color : COLOR;<br />

};<br />

struct vs2ps<br />

{<br />

float4 currpos : POSITION;<br />

float3 texpos: TEXCOORD0;<br />

float3 pointpos: COLOR;<br />

};<br />

vpixel_out main(<br />

vs2ps IN,<br />

uniform float3 c,<br />

uniform float3 n,<br />

uniform sampler2D texture,<br />

uniform samplerRECT buffer)<br />

{<br />

vpixel_out OUT;<br />

float4 color;<br />

float4 color_old;<br />

float3 v1;<br />

float3 v2;<br />

IN.currpos /= 512.0;<br />

v1 = normalize( IN.pointpos - c );<br />

v2 = normalize( IN.pointpos - n );<br />

color.rgba = tex2D( texture, IN.texpos.xyz ).rgba;<br />

color_old.rgba = texRECT( buffer, IN.currpos.xy ).rgba;<br />

color.a = dot( v1, v2) ;<br />

color.a = pow( color.a,8);<br />

OUT.color.rgb = ( color.rgb * color.a + color_old.rgb *<br />

color_old.a) / ( color.a + color_old.a);<br />

OUT.color.a = color.a + color_old.a; // alpha can get out <strong>of</strong><br />

range


}<br />

III<br />

return OUT;<br />

Class Definitions<br />

#include <br />

#include <br />

#include <br />

#include <br />

#include <br />

#include <br />

#include "matrix/matrix.h"<br />

#include <br />

#define FILE_NAME_SIZE 200<br />

#define MAX_DEPTH -100000000.00<br />

#define THRESHOLD 3.50<br />

#define ANGLE_THRESH M_PI/3<br />

#define TRIANGULATION 1<br />

#define SPLATTING 2<br />

extern float radius,theta,phi,ex,ey,ez;<br />

extern int mode, k, blendType, novelWidth,novelHeight;<br />

extern int type, resolution;<br />

extern float pointSize;<br />

extern int vn;<br />

extern bool finished;<br />

extern float m[3][4], calib[3][3];<br />

class InputView<br />

{<br />

public:<br />

char imageFile[FILE_NAME_SIZE];<br />

char depthFile[FILE_NAME_SIZE];<br />

int width, height;<br />

CMatrix modelMatrix;<br />

CMatrix calibration;<br />

CMatrix cameraCenter;<br />

float * depth;<br />

float * X;<br />

float * Y;<br />

float * Z;<br />

float * weights;<br />

view.<br />

// tells whether this camera is valid for this novel<br />

bool valid;<br />

// projected Values<br />

GLubyte * projectedImage;<br />

float * projected<strong>Depth</strong>;


tells whether particular pixel is visible in<br />

projectedImage.<br />

bool * holes;<br />

InputView();<br />

// loads depth and texture<br />

void load(char * imFile, char * depFile);<br />

void get3D();<br />

// projects this view into novel view orientation<br />

void projectView(float depth_threshold, int type, int<br />

resolution, float pointSize);<br />

};<br />

class <strong>Depth</strong>TR: public osg::Geode<br />

{<br />

public:<br />

int numberOfInputViews;<br />

InputView * inputViews;<br />

// tells the, whether particular pixel in novel view is<br />

valid or not.<br />

bool * valid;<br />

// temporary vars used to read GL buffers<br />

GLfloat * projectedZbuffer;<br />

// novel view params<br />

int width;<br />

int height;<br />

// this threshold is used to find whether three near by<br />

vertices can form triangle or not<br />

float depth_threshold;<br />

// this threshold is used to eliminate straight away if<br />

they are not near to novel view camera<br />

float angle_threshold;<br />

GLubyte * novelImage;<br />

float * novelView<strong>Depth</strong>;<br />

float * novelViewX;<br />

float * novelViewY;<br />

float * novelViewZ;<br />

CMatrix R, t;<br />

CMatrix cameraCenter;<br />

CMatrix calibration;<br />

CMatrix modelMatrix;<br />

bool save;<br />

<strong>Depth</strong>TR():osg::Geode()<br />

{<br />

init();<br />

depth_threshold = THRESHOLD;<br />

angle_threshold = ANGLE_THRESH;<br />

}<br />

void dtrDrawable_drawImplementation() ;


loads all the input textures and depth maps<br />

void load(char *);<br />

void computeWeights(int reqNumber, int blendType, float<br />

* angles, float * weights, bool * flags);<br />

// blending functions<br />

void angleBlending(float * angles, float * weights,<br />

bool * flags, int * positions);<br />

void exponentialAngleBlending(float * angles, float *<br />

weights, bool * flags, int * positions);<br />

void cosAngleBlending(float * angles, float * weights,<br />

bool * flags, int * positions);<br />

void newAngleBlending(int k, float * angles, float *<br />

weights, bool * flags, int * positions);<br />

void inverseAngleBlending(int k, float * angles, float<br />

* weights, bool * flags, int * positions);<br />

// resets the validity flag.<br />

void setNovelView(float model[][4], float calib[][3]);<br />

// returns novel view generated from the input textures<br />

and depth maps<br />

GLubyte* getNovelView(int blendType, int reqNumber);<br />

// project input view number vn onto the novel view<br />

orientation<br />

void projectView(int vn, int type, int resolution,<br />

float pointSize);<br />

void getProjectedDT(int vn);<br />

void saveProjectedImage(int vn);<br />

private:<br />

void init(); // Shared constructor code, generates<br />

the drawables<br />

class dtrDrawable;<br />

friend class dtrDrawable;<br />

bool dtrDrawable_computeBound(osg::BoundingBox&) const;<br />

};<br />

class <strong>Depth</strong>TR::dtrDrawable: public osg::Drawable<br />

{<br />

public:<br />

<strong>Depth</strong>TR* _ss;<br />

dtrDrawable(<strong>Depth</strong>TR* ss):<br />

osg::Drawable(), _ss(ss) { init(); }<br />

dtrDrawable():_ss(0)<br />

{<br />

init();<br />

osg::notify(osg::WARN)


"Warning: unexpected call to<br />

osgSim::SphereSegment::Spoke() copy constructor"setAttributeAndModes(new<br />

osg::LineWidth(2.0),osg::StateAttribute::OFF);<br />

}<br />

};<br />

virtual osg::BoundingBox computeBound() const;<br />

void calculateParams ();<br />

#endif

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!