25.08.2013 Views

SUMMARY PhD. THESIS

SUMMARY PhD. THESIS

SUMMARY PhD. THESIS

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

THE FACULTY OF ELECTRONICS, TELECOMMUNICATIONS<br />

AND INFORMATION TECHNOLOGY<br />

Șerban-Nicolae MEZA, Eng.<br />

<strong>SUMMARY</strong><br />

<strong>PhD</strong>. <strong>THESIS</strong><br />

CONTRIBUTIONS TO THE DEVELOPMENT<br />

OF 2D & 3D VISION SYSTEMS<br />

Thesis evaluation comission:<br />

Coordinator,<br />

Prof.dr.ing. Aurel VLAICU<br />

PREŞEDINTE: Professor Virgil DOBROTĂ, Ph.D., Eng, Head of Communications Department<br />

Technical University of Cluj-Napoca<br />

MEMBRI: - Professor Aurel VLAICU, Ph.D., Eng - conducător ştiinţific,<br />

Technical University of Cluj-Napoca;<br />

- Professor Mihai ROMANCA, Ph.D., Eng - reviewer,<br />

Transilvania University of Brașov;<br />

- Professor Radu VASIU, Ph.D., Eng - reviewer,<br />

Polytehnic University of Timişoara;<br />

- Associated Professor Bogdan ORZA, Ph.D., Eng - reviewer,<br />

Technical University of Cluj-Napoca.


Key Words<br />

3D vision, depth from polarisation, 3D LookUp Tables, distributed video coding<br />

Thesis’ Table of Content<br />

1. Introduction ......................................................................................................... Error! Bookmark not defined.<br />

2. Study on the Theoretical Fundamentals, Existing Solutions and Current Trends in the Evolution of Vision<br />

System Architectures, Technologies and Equipments............................................. Error! Bookmark not defined.<br />

2.1. Visual Perception. Spatial Perception. Stereovision. ............................... Error! Bookmark not defined.<br />

2.2. 2D and 3D Video Representations and Extensions (multiview, freeviewpoint)Error! Bookmark not<br />

defined.<br />

2.4. Elements of Projective Geometry for 2D & 3D Vision Systems ............ Error! Bookmark not defined.<br />

2.5. The Theory of Video Sampling ............................................................... Error! Bookmark not defined.<br />

2.6. Emerging Theories – Compressive Sampling ......................................... Error! Bookmark not defined.<br />

2.7. Standards for 2D & 3D Video Data Representation ................................ Error! Bookmark not defined.<br />

2.8. Technologies and Architectures for 2D & 3D Video Capturing – Existing Solutions and Trends . Error!<br />

Bookmark not defined.<br />

2.9. Technologies and Architectures for 2D & 3D Video Rendering – Existing Solutions and Trends Error!<br />

Bookmark not defined.<br />

Contributions ....................................................................................................... Error! Bookmark not defined.<br />

3. Study on 2D&3D Video Signal Coding for Transmission – Emerging Paradigms: Distributed Video Coding<br />

................................................................................................................................. Error! Bookmark not defined.<br />

3.1. The Distributed Video Coding Paradigm. Approaches and Perspectives Error! Bookmark not defined.<br />

3.2. General Presentation of the Theory of Distributed Video Coding .......... Error! Bookmark not defined.<br />

3.3. Implemented Architectures for Distributed Video Coding...................... Error! Bookmark not defined.<br />

3.4. Analisys of Distributed Video Coding Architectures .............................. Error! Bookmark not defined.<br />

Contributions ....................................................................................................... Error! Bookmark not defined.<br />

4. Acquiring Depth Information in Vision Systems ............................................... Error! Bookmark not defined.<br />

4.1. Photo and Video Acquisition for 3D and Stereovision ........................... Error! Bookmark not defined.<br />

4.2. The Acquistion of Depth from Defocus .................................................. Error! Bookmark not defined.<br />

4.3. Principles and Approaches for the Acquition of Depth from Light Polarisation Error! Bookmark not<br />

defined.<br />

4.4. Algorithm and Experimental Model for the Acquisition of Depth from Light Polarisation ........... Error!<br />

Bookmark not defined.<br />

Contributions ....................................................................................................... Error! Bookmark not defined.<br />

5. Real-time 2D High-Definition Video Signal Processing Using FPGA Technology and 3D-LookUp Tables<br />

................................................................................................................................. Error! Bookmark not defined.<br />

5.1. Realtime in-Camera Video Processing .................................................... Error! Bookmark not defined.<br />

5.2. 3D LookUp Table Processing for Video Signals .................................... Error! Bookmark not defined.<br />

5.3. Software Simulator Implementation for 3d LookUp Table Video Signal Processing ................ Error!<br />

Bookmark not defined.<br />

5.4. FPGA Technology Hardware Integration of 3D LookUp Table Video Signal Processing ............. Error!<br />

Bookmark not defined.<br />

5.5. Obtained Results in Realtime Video Signal Processing Using 3D LookUp TablesError! Bookmark<br />

not defined.<br />

2


Contributions ....................................................................................................... Error! Bookmark not defined.<br />

6. Conclusion ........................................................................................................... Error! Bookmark not defined.<br />

7. Bibliography ........................................................................................................ Error! Bookmark not defined.<br />

Appendix ................................................................................................................. Error! Bookmark not defined.<br />

Appendix ................................................................................................................. Error! Bookmark not defined.<br />

Appendix ................................................................................................................. Error! Bookmark not defined.<br />

Appendix ................................................................................................................. Error! Bookmark not defined.<br />

Appendix ................................................................................................................. Error! Bookmark not defined.<br />

General Subject of the Thesis<br />

In the last decade, visual representations (images or video sequences) have been present in almost all<br />

aspects of everyday life, making multimedia information ubiquitous. Services and products like digital<br />

photography, digital television, DVDs and BlueRays, the World Wide Web, videoconferencing, virtual<br />

collaboration and others, are all based to a great extent on the use of such visual content.<br />

Currently, one can witness, under the momentum of the theoretical and technological developments<br />

characteristic to the „digital era”, a wide process of „re-invention” of the video systems and services<br />

formerly based on the use of analogue technology. Likewise, the capabilities available in video<br />

rendering devices (screens that allow high contrast ratios, low power consumption and various shapes,<br />

sizes and spatial resolutions), the high throughput processing and transmitting networks (in the order of<br />

Mbytes per second) and the opportunities offered by the visual information acquisition systems that<br />

allow high quality images and videos (high resolution, colour depth and time sampling of more than<br />

160 frames per second) and the „boom” in applications and use case scenarios have all contributed in<br />

the last decade to an unprecedented growth and development of systems that make use of static (images<br />

and photos) and dynamic (videos and animations) visual information. In order to cope with these<br />

overwhelming requests of new product „invention” and technological breakthrough that will surpass the<br />

performances of current devices, services and technologies, one can observe a certain segmentation and<br />

specialisation adapted to different types of applications and user groups, as well as an import and<br />

mixture of techniques, methods, and technologies stemming from various fields (statistics, chemistry,<br />

physics, biology) that were summoned to help. However, one of the most important/revolutionary<br />

aspects of the past few years is the interest in adding the extra depth dimension to the „classic”<br />

brightness information that is captured from the scene.<br />

This thesis aims at bringing personal and original contributions to the challenges that are being currently<br />

pursued worldwide within the general field of 2D & 3D vision systems. The research, according to the<br />

structure of the presented work, focused on the following:<br />

a. Acquiring indepth knowledge in the field of 2D & 3D vision systems engineering: from photo to<br />

the latest innovations in imaging acquisition, processing, transport and rendering, including 3D.<br />

b. Studying, analysing and comparing emerging methods and paradigms for the coding and<br />

compression of the video signal – namely distributed video coding, all in the framework of the<br />

increasing number of mobile video-aware devices and wireless transmissions.<br />

c. Implementing and proposing new algorithms for acquiring/generating/extracting depth<br />

information from defocus and light polarisation, together with brightness information in order to<br />

3


create the premises for the development of photo-video devices that are able to work without<br />

any constraint on illumination or distance to the objects in the scene in any generic 3D scenario.<br />

d. Developing and implementing new methods, techniques, and real time processing algorithms for<br />

high definition video signal using the advantages offered by the FPGA technology (Field<br />

Programmable Gate Array) and the LUT (Look Up Table) approach.<br />

Each of these objectives was thoroughly studied and scientific contributions were brought as a result of<br />

the research activity undertaken by the author. Due to the complexity of the field in general and the<br />

specific tasks under each formulated objective, the thesis was structured in 6 different chapters.<br />

Thesis Structure and Content<br />

Chapter 1 includes the description of vision information systems in general, 2D and 3D, and sets the<br />

general framework under which the rest of the thesis subject and work was carried.<br />

Chapter 2 describes the characteristic elements of the theory and practice of the systems aimed at<br />

exchanging visual information between people, and later focuses on the technology and engineering<br />

part of the main devices used: the video camera and the rendering screen / display. The study is based<br />

on what the author considers as being some of the most important highlights in video systems evolution<br />

as well as examples of the latest innovations in the field: theoretical (compressive sampling) and<br />

technological (OLED displays). The aim of this endeavour is to present and set the exact context in<br />

which the rest of the thesis’ contributions apply, together with their exact relevance and particularly<br />

solved problem, as well as their global impact in the development of 2D & 3D vision systems.<br />

Chapter 3 presents the emerging distributed video coding paradigm that combines knowledge from<br />

theory and practice in the field of channel coding with fundamental theoretical results presented by<br />

Slepian, Wolf, Wyner and Ziv regarding lossy source coding.<br />

The research included the following:<br />

a. Understanding the main fundamental theory from the field of distributed source coding and the<br />

way it applies to video coding in particular, together with presenting the main results from the<br />

field (the theorems of Slepian & Wolf, and Wyner & Ziv).<br />

b. Studying the „Stanford” distributed video coding architecture and the various improvements and<br />

used channel codes reported in the existing literature.<br />

c. Studying the „PRISM” distributed video coding architecture and the various improvements and<br />

used channel codes reported in the current literature.<br />

d. Studying the extensions of the „Stanford”, „PRISM” and other distributed video coding<br />

architectures.<br />

e. Comparing the main approaches and implementations published in the field of distributed video<br />

coding.<br />

Chapter 4 presents 2 methods that, according to the author, offer, independently of the structure of the<br />

illumination or the use of other than visual sensors, the premises necessary for implementing an<br />

integrated 3D capturing device. The proposed implementation and algorithms for acquiring / generating<br />

/ extracting depth from defocus and, respectively, light polarisation, offer the following advantages:<br />

a. Independence from illumination.<br />

4


. The possibility of acquiring the depth information from the distance by using optical „close-up”<br />

means (lenses); this is a major advantage compared to the methods that use structured light or the<br />

propagation time of some laser beam.<br />

c. The proposed algorithm for „depth from light polarisation” is sufficiently flexible to allow the<br />

implementation on video sequences in real time. This is due to the fact that the implementation is<br />

based on a finite number of simple mathematical operations (multiplications and additions).<br />

d. Both solutions are based on regular 2D photo / video sensors and require minimum<br />

additions/transformations for enabling 3D acquisition.<br />

The research activity, based on the study of different already available technologies, techniques and<br />

algorithms for capturing the depth information for 3D - but also on the actual implementation of an<br />

algorithm for „depth from defocus” and, respectively, on developing a new method that uses<br />

information from light polarisation - proved that the proposed approach could be easily extended and<br />

integrated into future photo/video 3D aware cameras.<br />

Chapter 5 presented an architecture for real-time, pixel-based and frame based processing of high<br />

definition video (1920 x 1080 pixels per frame, 50 frames per second). The implementation, using<br />

FPGA technology and Look Up Tables has the following advantages over the ASIC (”application<br />

specific integrated circuits”) based version:<br />

a. Independence of video processing speed from the complexity of the algorithm that analytically<br />

describes it. So, no matter how complex the expressions used to compute the final pixel value<br />

are, the actual processing is performed in the same amount of time.<br />

b. The possibility of hardware reuse for different processing. The use of the FPGA technology<br />

allows the easy reconfiguration of the processing, including for future 3D video systems.<br />

c. Easy extension and integration of other user defined processing / functions. The proposed<br />

architecture, based on Look Up Tables, enables the addition of further processing functions<br />

according to the user needs.<br />

d. The possibility for integration of third party processing functions that are defined using<br />

proprietary equations and the availability to further ensure copyright protection.<br />

The research effort, resulting in a functional study of a look-up table based video processing<br />

architecture performed using a software simulator, implemented by the author and in a working proofof-concept<br />

hardware model, proved that the approach, as well as the actual integration in a video camera<br />

is feasible.<br />

Chapter 6 closes the thesis with a summary and conclusions of the entire endeavour.<br />

Main Contributions and Conclusions<br />

In the context of the research performed by the author on the existing literature as well as the result of<br />

the implemented simulations and functional models, this thesis brought the following main<br />

contributions to the field of 2D&3D vision systems and state of the art:<br />

a. An overview of the main technological aspects that support the current development of 3D<br />

systems has been realised, namely the advances in CMOS and CCD sensor technology and in<br />

PLASMA, LCD, OLED display technology<br />

5


. Examples of the main emerging concepts and theories in field of 2D&3D vision systems have<br />

been presented: stereo vision and epipolar geometry<br />

c. A presentation of the emerging paradigms in the field has been done: that of compressive<br />

sampling for video data acquisition and, respectively, that of distributed video coding<br />

d. A study of the most important distributed video coding architectures (the „Stanford” and the<br />

„PRISM” architecture) together with their improvements and extensions has been performed.<br />

e. A comparison between the existing implementations of codec designs based on the distributed<br />

video coding paradigm has been realised.<br />

f. An algorithm based on the concept of „depth from defocus” has been implemented.<br />

g. An new algorithm for extracting the depth information form light polarisation has been proposed<br />

and implemented<br />

h. An evaluation of the proposed algorithm for depth extraction from light polarisation has been<br />

performed based on various sets of measurements and the depth map was constructed for the test<br />

scenes<br />

i. An algorithm for video processing using lookup tables has been developed together with the<br />

algorithm for computing and writing data in these tables<br />

j. The algorithm for real-time video processing using look-up tables has been implemented and<br />

simulated in software, at frame level.<br />

k. An experimental, proof-of-concept model, using FPGA technology and look-up tables has been<br />

implemented; the prototype, developed with contributions from the author, is protected by<br />

official copyright laws and is in the intellectual property of Netherland Grass Valley company.<br />

From the point of view of the economical value of the research outcome, the proposed algorithm for<br />

depth from light polarisation offers the premises for creating new photo / video cameras able to capture<br />

brightness but also depth information from the scene. Also, the research results presented in chapter 5,<br />

of real time video processing using FPGAs and LUTs, was integrated in the latest LDK 8xxx series of<br />

professional video cameras launched in 2012 by the Netherland Grass Valley company.<br />

The main published papers with results from the thesis that were presented to the scientific<br />

community so far are the following:<br />

S.N. Meza, K.J. Damstra, J.V. Rooy, S. Persa, “Embedded real-time look-up table processing for<br />

high definition video signals” Proceedings of 2010 IEEE International Conference on Automation<br />

Quality and Testing Robotics (AQTR 2010), ISBN 978-1-4244-6724-2, pp 315 – 319<br />

S.N. Meza, A. Vlaicu, B. Orza, “Bridging the gap between video data acquisition, compression<br />

and transmission under emerging technologies and scenarios”, Proceedings of 2010 IEEE<br />

International Conference on Automation Quality and Testing Robotics (AQTR 2010), ISBN 978-<br />

1-4244-6724-2, pp 309 - 314<br />

Also, the author made the following public presentation within the subject and thematics of the thesis:<br />

6


S.N. MEZA ”Distributed Video Coding” – presentation at the „16th Summer School on Image<br />

Processing SSIP” 2008 Technical University of Vienna, Austria.<br />

Future development of the research done by the author aim at optimising the implementation of the<br />

proposed algorithm of depth information extraction from light polarisation and its extension to video.<br />

Likewise, an experimental model for a device capable of 3D (brightness and depth) data acquisition will<br />

be realised based on the result of chapter 4. By sensor level integration of the polariser filter, direct<br />

capturing of the depth information from a single perspective will be possible, without any intervention<br />

on scene illumination or projecting any scanning pattern.<br />

7

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!