SUMMARY PhD. THESIS

SUMMARY PhD. THESIS SUMMARY PhD. THESIS

etti.utcluj.ro
from etti.utcluj.ro More from this publisher
25.08.2013 Views

create the premises for the development of photo-video devices that are able to work without any constraint on illumination or distance to the objects in the scene in any generic 3D scenario. d. Developing and implementing new methods, techniques, and real time processing algorithms for high definition video signal using the advantages offered by the FPGA technology (Field Programmable Gate Array) and the LUT (Look Up Table) approach. Each of these objectives was thoroughly studied and scientific contributions were brought as a result of the research activity undertaken by the author. Due to the complexity of the field in general and the specific tasks under each formulated objective, the thesis was structured in 6 different chapters. Thesis Structure and Content Chapter 1 includes the description of vision information systems in general, 2D and 3D, and sets the general framework under which the rest of the thesis subject and work was carried. Chapter 2 describes the characteristic elements of the theory and practice of the systems aimed at exchanging visual information between people, and later focuses on the technology and engineering part of the main devices used: the video camera and the rendering screen / display. The study is based on what the author considers as being some of the most important highlights in video systems evolution as well as examples of the latest innovations in the field: theoretical (compressive sampling) and technological (OLED displays). The aim of this endeavour is to present and set the exact context in which the rest of the thesis’ contributions apply, together with their exact relevance and particularly solved problem, as well as their global impact in the development of 2D & 3D vision systems. Chapter 3 presents the emerging distributed video coding paradigm that combines knowledge from theory and practice in the field of channel coding with fundamental theoretical results presented by Slepian, Wolf, Wyner and Ziv regarding lossy source coding. The research included the following: a. Understanding the main fundamental theory from the field of distributed source coding and the way it applies to video coding in particular, together with presenting the main results from the field (the theorems of Slepian & Wolf, and Wyner & Ziv). b. Studying the „Stanford” distributed video coding architecture and the various improvements and used channel codes reported in the existing literature. c. Studying the „PRISM” distributed video coding architecture and the various improvements and used channel codes reported in the current literature. d. Studying the extensions of the „Stanford”, „PRISM” and other distributed video coding architectures. e. Comparing the main approaches and implementations published in the field of distributed video coding. Chapter 4 presents 2 methods that, according to the author, offer, independently of the structure of the illumination or the use of other than visual sensors, the premises necessary for implementing an integrated 3D capturing device. The proposed implementation and algorithms for acquiring / generating / extracting depth from defocus and, respectively, light polarisation, offer the following advantages: a. Independence from illumination. 4

. The possibility of acquiring the depth information from the distance by using optical „close-up” means (lenses); this is a major advantage compared to the methods that use structured light or the propagation time of some laser beam. c. The proposed algorithm for „depth from light polarisation” is sufficiently flexible to allow the implementation on video sequences in real time. This is due to the fact that the implementation is based on a finite number of simple mathematical operations (multiplications and additions). d. Both solutions are based on regular 2D photo / video sensors and require minimum additions/transformations for enabling 3D acquisition. The research activity, based on the study of different already available technologies, techniques and algorithms for capturing the depth information for 3D - but also on the actual implementation of an algorithm for „depth from defocus” and, respectively, on developing a new method that uses information from light polarisation - proved that the proposed approach could be easily extended and integrated into future photo/video 3D aware cameras. Chapter 5 presented an architecture for real-time, pixel-based and frame based processing of high definition video (1920 x 1080 pixels per frame, 50 frames per second). The implementation, using FPGA technology and Look Up Tables has the following advantages over the ASIC (”application specific integrated circuits”) based version: a. Independence of video processing speed from the complexity of the algorithm that analytically describes it. So, no matter how complex the expressions used to compute the final pixel value are, the actual processing is performed in the same amount of time. b. The possibility of hardware reuse for different processing. The use of the FPGA technology allows the easy reconfiguration of the processing, including for future 3D video systems. c. Easy extension and integration of other user defined processing / functions. The proposed architecture, based on Look Up Tables, enables the addition of further processing functions according to the user needs. d. The possibility for integration of third party processing functions that are defined using proprietary equations and the availability to further ensure copyright protection. The research effort, resulting in a functional study of a look-up table based video processing architecture performed using a software simulator, implemented by the author and in a working proofof-concept hardware model, proved that the approach, as well as the actual integration in a video camera is feasible. Chapter 6 closes the thesis with a summary and conclusions of the entire endeavour. Main Contributions and Conclusions In the context of the research performed by the author on the existing literature as well as the result of the implemented simulations and functional models, this thesis brought the following main contributions to the field of 2D&3D vision systems and state of the art: a. An overview of the main technological aspects that support the current development of 3D systems has been realised, namely the advances in CMOS and CCD sensor technology and in PLASMA, LCD, OLED display technology 5

create the premises for the development of photo-video devices that are able to work without<br />

any constraint on illumination or distance to the objects in the scene in any generic 3D scenario.<br />

d. Developing and implementing new methods, techniques, and real time processing algorithms for<br />

high definition video signal using the advantages offered by the FPGA technology (Field<br />

Programmable Gate Array) and the LUT (Look Up Table) approach.<br />

Each of these objectives was thoroughly studied and scientific contributions were brought as a result of<br />

the research activity undertaken by the author. Due to the complexity of the field in general and the<br />

specific tasks under each formulated objective, the thesis was structured in 6 different chapters.<br />

Thesis Structure and Content<br />

Chapter 1 includes the description of vision information systems in general, 2D and 3D, and sets the<br />

general framework under which the rest of the thesis subject and work was carried.<br />

Chapter 2 describes the characteristic elements of the theory and practice of the systems aimed at<br />

exchanging visual information between people, and later focuses on the technology and engineering<br />

part of the main devices used: the video camera and the rendering screen / display. The study is based<br />

on what the author considers as being some of the most important highlights in video systems evolution<br />

as well as examples of the latest innovations in the field: theoretical (compressive sampling) and<br />

technological (OLED displays). The aim of this endeavour is to present and set the exact context in<br />

which the rest of the thesis’ contributions apply, together with their exact relevance and particularly<br />

solved problem, as well as their global impact in the development of 2D & 3D vision systems.<br />

Chapter 3 presents the emerging distributed video coding paradigm that combines knowledge from<br />

theory and practice in the field of channel coding with fundamental theoretical results presented by<br />

Slepian, Wolf, Wyner and Ziv regarding lossy source coding.<br />

The research included the following:<br />

a. Understanding the main fundamental theory from the field of distributed source coding and the<br />

way it applies to video coding in particular, together with presenting the main results from the<br />

field (the theorems of Slepian & Wolf, and Wyner & Ziv).<br />

b. Studying the „Stanford” distributed video coding architecture and the various improvements and<br />

used channel codes reported in the existing literature.<br />

c. Studying the „PRISM” distributed video coding architecture and the various improvements and<br />

used channel codes reported in the current literature.<br />

d. Studying the extensions of the „Stanford”, „PRISM” and other distributed video coding<br />

architectures.<br />

e. Comparing the main approaches and implementations published in the field of distributed video<br />

coding.<br />

Chapter 4 presents 2 methods that, according to the author, offer, independently of the structure of the<br />

illumination or the use of other than visual sensors, the premises necessary for implementing an<br />

integrated 3D capturing device. The proposed implementation and algorithms for acquiring / generating<br />

/ extracting depth from defocus and, respectively, light polarisation, offer the following advantages:<br />

a. Independence from illumination.<br />

4

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!