06.03.2013 Views

universiti putra malaysia gesture recognition using web camera lew ...

universiti putra malaysia gesture recognition using web camera lew ...

universiti putra malaysia gesture recognition using web camera lew ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

UNIVERSITI PUTRA MALAYSIA<br />

GESTURE RECOGNITION USING WEB CAMERA<br />

LEW YUAN POK.<br />

FK 2004 25


GESTURE RECOGNITION USING WEB CAMERA<br />

By<br />

LEW YUAN POK<br />

n 31. ,<br />

Thesis Submitted to the School of Graduate Studies.<br />

Universiti Putra Malaysia in Fulfillment of the<br />

Requirements for the Degree of Master of Science<br />

February 2004<br />

*, 4': t<br />

. -


My parents :<br />

DEDICATION<br />

Lew Kwan Shin & Lee Se Moy<br />

My brothers and sister :<br />

Yuan Huai. Yuan Sing & Yuan Yee<br />

and<br />

Yien Yien


Abstract of thesis presented to the Senate of Universiti Putra Malaysia in fulfillment<br />

of the requirements for the degree of Master of Science<br />

GESTURE RECOGNITION USING WEB CAMERA<br />

LEW YUAN POK<br />

February 2004<br />

Chairman : Associate Professor Hj. Abdul Rahman Ramli, Ph.D.<br />

Faculty : Engineering<br />

Gesture <strong>recognition</strong> represents an important ability by which a computer is able to<br />

directly accept human <strong>gesture</strong> as input to trigger different actions just like<br />

conventional input devices such as keyboard. mouse, joystick and etc. As the Human<br />

Computer Interaction (HCI) progresses over the years, emphasis is placed more on<br />

developing input devices which are most convenient and easy to use. Human <strong>gesture</strong><br />

is not only natural and intuitive to a user, but can also represent motions of high<br />

degree of freedom which is of utmost importance in many applications especially in<br />

virtual reality.<br />

This thesis presents the design of an offline system which is capable of recognizing<br />

hand postures fiom the visual input of a <strong>web</strong> <strong>camera</strong>. The hand segmentation is based<br />

on image subtraction technique and a skin color modeling process. Fourier<br />

iii


descriptors are used as the features to describe the geometry of different hand<br />

postures while the <strong>recognition</strong> process is based on minimum distance classifier. The<br />

results obtained indicate that the system is able to recognize hand postures with<br />

reasonable accuracy.


Abstrak tesis yang dikemukakan kepada Senat Universiti Putra Malaysia untuk<br />

memenuhi keperluan untuk Ijazah Master Sains<br />

PENGECAMAN PERGERAKAN BERPANDUKAN KAMERA WEB<br />

Pengerusi<br />

Fakulti<br />

Oleb<br />

LEW YUAN POK<br />

Februari 2004<br />

: Profesor Madya Hj. Abdul Rahman Ramli, Ph.D.<br />

: Kejuruteraan<br />

Pengecaman pergerakan merupakan sesuatu keupayaan yang penting di mana sebuah<br />

komputer dapat menerirna pergerakan manusia sebagai input untuk melanwkan<br />

pelbagai tindakan seperti yang dilakukan oleh peranti input tradisional misalnya<br />

papan kekunci. tetikus, kayu bedik dan sebagainya. Seiring dengan perkembangan<br />

cam interaksi antara manusia dengan komputer selama ini, lebih banyak penekanan<br />

diberikan kepada pembangunan peranti input yang berguna dan senang dipakai.<br />

Pergerakan manusia bukan sahaja semulajadi clan intuitif kepada seorang pengguna<br />

komputer, tetapi juga &pat mewakili pergerakan yang berdarjah kebebasan tinggi,<br />

menjadikannya arnat penting dalam banyak aplikasi terutamanya kenyataan maya.<br />

Tesis ini membentangkan proses rekabentuk sebuah sistem luar talian yang mampu<br />

mengecam keadaan tangan daripada input visual sebuah karnera <strong>web</strong>. Segmentasi<br />

tangan adalah berdasarkan kaedah penolakan irnej dan sebuah proses permodelan<br />

v


warna kulit. Deskriptor Fourier pula digunakan sebagai penghurai untuk<br />

menggambarkan geometri keadaan tangan yang beriainan manakala proses<br />

pengecaman adalah berpandukan pengelas jarak minimum. Keputusan yang<br />

diperoleh menunjukkan bahawa sistem ini dapat mengecam keadaan tangan dengan<br />

ketepatan yang rnunasabah.


ACKNOWLEDGEMENTS<br />

My deepest thanks to my supervisor, Dr. Hj. Abdul Rahman Ramli, without<br />

whom this project would never have materialized in the first place. He has constantly<br />

encouraged me and given me clear guidance to make sure that I am able to complete<br />

the thesis within a reasonable time frame. Also to both my co-supervisors, Dr.<br />

Veeraraghavan Prakash and Mrs Roslizah Ali who have given me valuable advices<br />

throughout the whole period of the writing of this thesis.<br />

A very big thanks to Qussay Abbas Salih al-Badri. my senior in the research<br />

group, who has been most generous in sharing his ideas and experiences. Also to my<br />

colleagues. Koay Su Yeong and Beh Kok Siang who have provided me valuable<br />

feedbacks on my thesis.<br />

Not to mention also all the lecturers and staffs of Department of Computer<br />

and Communication Systems Engineering as well as members of Multimedia and<br />

Intelligent Systems Research Group who have all played their supporting roles.<br />

Last but not least. 1 would like to thank my parents and family members who<br />

have been most supportive in each and every way I could think of. Also, I owe many<br />

thanks to Yien Yien, who has stood behind me no matter how harsh the situation is.<br />

vii


I certify that an Examination Committee met on loth February 2004 to conduct the final<br />

examination of Lew Yuan Pok on his Master of Science thesis entitled "Gesture<br />

Recognition Using Web Camera" in accordance with Universiti Pertanian Xlalaysia<br />

(Higher Degree) Act 1980 and Universiti Pertanian Malaysia (Higher Degree)<br />

Regulations 1981. The Committee recommends that the candidate be awarded the<br />

relevant degree. Members of the Examination Committee are as follows:<br />

Mohamad Khazani Abdullah, Ph.D.<br />

Faculty of Engineering<br />

Universiti Putra Malaysia<br />

(Chairman)<br />

Hj. Abdul Rahman Ramli, Ph.D.<br />

Associate Professor<br />

Faculty of Engineering<br />

Universiti Putra Malaysia<br />

(Member)<br />

Veeraraghavan Prakash, Ph.D.<br />

Faculty of Engineering<br />

Universiti Putra Malaysia<br />

(Member)<br />

Roslizah .41i<br />

Faculty of Engineering<br />

Universiti Putra Malaysia<br />

(Member)<br />

~rofessor.~e~ut)\~e~<br />

School of Graduate Studies<br />

Universiti Putra 3lala~sia<br />

Date: 1 7 JUN 2004


This thesis submitted to the Senate of Universiti Putra Malaysia and has been<br />

accepted as fi~lfillment of the requirements for the degree of Master of Science. The<br />

members of the Supervisory Committee are as follows:<br />

Associate Professor Hj. Abdul Rahrnan Ramli, Ph.D.<br />

Faculty of Engineering<br />

Universiti Putra Malaysia<br />

(Chairman)<br />

Veeraraghavan Prakash, Ph.D.<br />

Faculty of Engineering<br />

Universiti Putra Malaysia<br />

(Member)<br />

Roslizah Ali<br />

Faculty of Engineering<br />

Universiti Putra Malaysia<br />

(Member)<br />

Y<br />

AINI IDERIS, Ph.D.<br />

Professor/Dean<br />

School of Graduate Studies<br />

Universiti Putra Malaysia<br />

Date : 0 8 SEP 2004


DECLARATION<br />

1 hereby declare that the thesis is based on my original work except for quotations<br />

and citations which have been duly acknowledged. I also declare that it has not been<br />

previously or con-currently submitted for any other degree at UPM or other tertiary<br />

institution.<br />

LEW YUAN POK


DEDICATION<br />

ABSTRACT<br />

ABSTRAK<br />

ACKNOWLEDGEMENTS<br />

APPROVAL<br />

DECLARATION<br />

LIST OF FIGURES<br />

LIST OF TABLES<br />

LIST OF ABBREVIATIONS<br />

CHAPTER<br />

TABLE OF CONTENTS<br />

INTRODLTCTION<br />

1.1 Background<br />

1.1.1 Gesture Recognition<br />

1.1.2 Web Camera<br />

Objectives<br />

Thesis Organization<br />

LITERATURE REVIEW<br />

2.1 Machine Vision<br />

2.1.1 Working Principles of Machine Vision<br />

2.1.2 Examples of Machine Vision Applications<br />

Computer Vision<br />

Gesture Recognition<br />

2.3.1 Vision Based Gesture Recognition<br />

2.3.2 Gesture Recognition Using Other Sensors<br />

Static Hand Gesture Recognition<br />

Dynamic Hand Gesture Recognition<br />

Hand Segmentation Methods<br />

Feature Extraction Methods<br />

Gesture Recognition Methods<br />

2.8.1 Template Matching<br />

2.8.2 Neural Network Methods<br />

2.8.3 Statistical Methods<br />

2.8.3 Rule-based Methods<br />

3 METHODOLOGY<br />

3.1 Overview<br />

3.2 Hand Segmentation<br />

3.2.1 lmage Subtraction<br />

3.2.2 Segmentation for Gray Level Image<br />

3.2.3 Segmentation for RGB Color Image<br />

3.2.4 Segmentation for Normalized RGB Color Image<br />

3.2.5 Skin Color Modeling<br />

ii<br />

. . .<br />

Ill<br />

v<br />

vii<br />

. . .<br />

Vlll<br />

X<br />

...<br />

Xlll<br />

xv<br />

svi


3.2.6 Segmentation with Skin Color Model<br />

~e~resentation and Description<br />

3.3.1 Preprocessing<br />

3.3.1.1 Holes Filling<br />

3.3. 1 2 Morphological Opening<br />

3.3.1.3 Median Filtering<br />

Hand Representation<br />

3.3.2.1 Boundary Extraction<br />

3.3.2.2 Boundary Correction<br />

3.3.2.3 Boundary Points Selection<br />

3.3.3 Fourier Descriptors<br />

Recognition<br />

3.4.1 Cluster Analysis<br />

3 -4.2 K-means Clustering<br />

3.4.3 Training Phase<br />

3.4.3. I Preparation<br />

3.4.3.2 Clustering<br />

Recognition Phase<br />

3.4.4.1 Minimum Distance Classifier<br />

RESULTS AND DISCUSSIONS<br />

4. I Comparison of Segmentation Techniques<br />

4.1.1 Segmentation for Gray Level Image<br />

4.1.2 Segmentation for RGB Color Image<br />

4.1.3 Segmentation for Normalized RGB Color Image<br />

4.1.4 Summary<br />

Segmentation with Skin Color Model<br />

Preprocessing<br />

4.3.1 Holes Filling<br />

4.3.2 Morphological Opening<br />

4.3.3 Median Filtering<br />

Hand Representation<br />

4.4.1 Boundary Extraction<br />

4.4.2 Boundary Correction<br />

4.4.3 Boundary Points Selection<br />

Fourier Descriptors<br />

K-means Clustering of Training Samples<br />

Classification of Training Samples<br />

Classification of Testing Samples<br />

COSCLUSION AND FUTURE RESEARCH<br />

5.1 Summaq<br />

C 7 d .- Future Research Directions<br />

REFEREWES<br />

APPEhDICES<br />

BIODATA OF THE AUTHOR


Figure<br />

LIST OF FIGURES<br />

2.1 Fiber Optic Inspection System (developed by Dynamic Motion<br />

Control)<br />

Multiple Wavelength Images of A Single Apple<br />

Elements Of Computer Vision Process<br />

Static Hand Gesture Recognition System Block Diagram<br />

Hand Segmentation Flow Chart<br />

Original RGB Color Image<br />

Illustration of the Concept in Image Subtraction<br />

Gra). Level Image<br />

Skin Color Modeled As Normal Distributions<br />

Hand Representation and Description Flow Chart<br />

Examples of Six Hand Postures<br />

Illustration of the Block Operations Used in Holes Filling<br />

Elements in Morphological Opening<br />

Median Filtering Operation<br />

Examples of Image with High Potential of Being Misclassified<br />

Parameters Used in Correcting the Boundary<br />

A Digital Boundary and Its Representation As A Complex<br />

Sequence<br />

Hand Posture Recognition Process Flow Chart<br />

Samples of Five Hand Images Used for Segmentation<br />

Samples of Hand Images Representing Six Different Postures<br />

Page<br />

Segmentation Result Using Skin Color Modeling for Sample 6 - 106<br />

xiii<br />

14


Sample I I<br />

Holes Filling Result for Sample 6 - Sample I 1<br />

Morphological Opening Result for Sample 6 - Sample 1 1<br />

Median Filtering Result for Sample 6 - Sample 1 1<br />

Boundary Extraction Result for Sample 6 - Sample 1<br />

Boundary Correction Result for Sample 6 - Sample I I<br />

Hand Boundq Represented by Selected Points for Sample 6 -<br />

Sample 11<br />

Training Samples for K-means Clustering<br />

Testing Samples Used to Analyze the Recognition Performance<br />

Euclidean Distance between Cluster Centers and Testing Images<br />

(a) - (h) in Table 4.9<br />

Distortion in Hand Boundary Extracted fiom Testing Image in<br />

Table 4.9(f)<br />

Euclidean Distance between Cluster Centers and Testing Images<br />

(a)-(f) in Table 4.8<br />

xiv


Table<br />

4.1<br />

LIST OF TABLES<br />

Page<br />

Segmentation for Samples of Five Gray Level Images Based on 97<br />

the Threshold Value of 10%. 15% and 20% Respectively<br />

Segmentation for Samples of Five Images in the RGB Color Space 99<br />

Based on the Threshold Value of 10% 15% and 20%<br />

Respectively<br />

Segmentation for Samples of Five Images in the Normalized RGB 10 1<br />

Color Space Based on the Threshold Value of 10%. 15% and 20%<br />

Respectively<br />

Segmentation for Samples of Five Images Based on the Skin 103<br />

Color Model Applying Threshold Value of 10%, 15% and 20%<br />

Respectivel).<br />

Coordinates of Boundq Points for Sample 6<br />

Steps for Deriving Fourier Descriptors<br />

Total Squared Error (TSE) for Individual Cluster<br />

Examples of Classification Result<br />

Examples of Falsely Classified Testing Images


2 D<br />

3 D<br />

m<br />

A1<br />

CCD<br />

CMOS<br />

DFT<br />

DTW<br />

FSM<br />

FD<br />

GUI<br />

HMM<br />

HCI<br />

IR<br />

IC<br />

MIPS<br />

NIR<br />

PC<br />

QE<br />

RBFNN<br />

Rx's<br />

RNN<br />

RGB<br />

ROI<br />

USB<br />

VLSI<br />

VE<br />

LIST OF ABBREVIATIONS<br />

Two Dimension<br />

Three Dimension<br />

Analogue to Digital<br />

Artificial Intelligence<br />

Charge Couple Device<br />

Complex Metal Oxide Semiconductor<br />

Discrete Fourier Transform<br />

Dynamic Time Warping<br />

Finite State Machine<br />

Fourier Descriptors<br />

Graphical User Interface<br />

Hidden Markov Model<br />

Human Computer Interaction<br />

Infrared<br />

Integrated Circuit<br />

Millions of Instructions per Second<br />

Near Infrared<br />

Personal Computer<br />

Quantum Eficiency<br />

Radial Basis Function Neural Network<br />

Receivers<br />

Recurrent Neural Network<br />

Red, Green, Blue<br />

Region of Interest<br />

Universal Serial Bus<br />

Very Large Scale Integration<br />

Virtual Environment<br />

xvi


1.1 Background<br />

CHAPTER 1<br />

The evolution of the interaction bemeen computer and human or better<br />

known as Human Computer Interaction (HCI) has always been interesting. The first<br />

generation computer user interface is a text based interface where user interact by<br />

typing relevant commands. By typing the commands through keyboard, the computer<br />

will perform relevant actions and inform the user of the outcome.<br />

Then came the Graphical User Interface (GUI) where the main medium of<br />

interaction are symbolic graphics called icons. Each icon is assigned a specific<br />

meaning intended for specific application or certain action (Webopedia, 2003). In<br />

order to carry out a specific action or to run a specific application, the user can just<br />

click on the relevant icon by <strong>using</strong> a mouse. Texc is still dominant in this second<br />

generation computer user interface as it is impossible to convert every line of<br />

command into icon, however, the use of text is minimized or avoided when<br />

necessap.<br />

Nevertheless, it was soon realized that no matter how powerful the two kinds<br />

of user interface are, they will never be able to replace the way human communicate<br />

most naturally, i.e. through the use of <strong>gesture</strong>. As applications shift fiom 2D to 3D


and eventually Virtual Environment (VE), the use of conventional input devices such<br />

as keyboard, mouse, trackball or joystick becomes more awkward and inconvenient<br />

than ever. It was then understood that nothing is able to navigate better in such<br />

applications than the human <strong>gesture</strong> itself. The human <strong>gesture</strong> is not only natural and<br />

intuitive to a user, it can also represent motions of high degree of freedom impossible<br />

to achieve <strong>using</strong> other input devices.<br />

Owing to the many advantages displayed by the use of human <strong>gesture</strong> as a<br />

powerful input device and the potential applications, <strong>gesture</strong> <strong>recognition</strong> has become<br />

an important research field in recent years. Covering the diverse fields of motion<br />

modeling, motion analysis, pattern <strong>recognition</strong>, machine learning and even<br />

psycholinguistic studies, <strong>gesture</strong> <strong>recognition</strong> focuses on the study of human posture<br />

and movement of different parts of body such as hand and head (Wu and Huang,<br />

1999). These also become the major motivation of this thesis to study various aspects<br />

of the vision based static hand <strong>gesture</strong> <strong>recognition</strong> system and to implement a system<br />

which can eventually be used as input in VE.<br />

1.1.1 Gesture Recognitioo<br />

Gestures are loosely defined by different researchers in different disciplines.<br />

From the point of view of biologists, the notion of <strong>gesture</strong> is to embrace all kinds of<br />

instances where an individual engages in movements whose communicative intent is<br />

paramount, manifest, and openly acknowledged (Nespoulous, 1986). This includes<br />

pointing at an object to indicate a reference, putting a thumb up to indicate consent or<br />

2


approval or conveying abstract meaning to one another <strong>using</strong> sign language. Gesture<br />

<strong>recognition</strong> on the other hand can be defined as the ability to identify specific human<br />

<strong>gesture</strong>s by a machine or computer and <strong>using</strong> them to convey information or for<br />

controlling a device.<br />

Gestures are important part of human communication where large amount of<br />

information is encoded in vet compact form such as specific hand shape or<br />

orientation accompanied by certain movement. The ability to recognize these<br />

<strong>gesture</strong>s represents an oppommity to decode the information conveyed most<br />

naturally by human. It is an alternative way of interaction between human and<br />

computer where little learning is required on the user as <strong>gesture</strong> is part of human's<br />

daily life communication.<br />

Teaching a computer to recopize human <strong>gesture</strong>s has been a goal of pursuit<br />

by various researchers since the 1980's. Earlier efforts in this research are focused on<br />

<strong>using</strong> a mechanical glove to capture the human <strong>gesture</strong>s. On the mechanical glove,<br />

sensors are used to measure the bending angle of various joints of the hand and the<br />

pressure exerted b) the palm. The different bending angle measurement will<br />

correspond to different geometric representation of the hand. In order to determine<br />

the position and orientation of the forearm in space, tracking sensors are used.<br />

However, the use of the mechanical glove is considered intrusive and unnatural to a<br />

user.<br />

The focus in <strong>gesture</strong> <strong>recognition</strong> research later shifis to vision based <strong>gesture</strong><br />

in conjunction with the advances made in the field of machine vision and<br />

3


computer vision. Vision based <strong>gesture</strong> <strong>recognition</strong> is a <strong>gesture</strong> <strong>recognition</strong> approach<br />

based on visual input. Imaging device such as <strong>camera</strong> is used to capture image<br />

sequences of human motion. Geometric properties of human hand or head together<br />

with their trajectories are then extracted from the image sequences to reconstruct<br />

human <strong>gesture</strong>. The vision based approach proves to be an attractive alternative since<br />

it is natural and not intrusive.<br />

Earlier vision based <strong>gesture</strong> <strong>recognition</strong> system uses gray level image<br />

sequences as the input. In 1992, first attempt was made to recognize hand <strong>gesture</strong><br />

from color video signals in real time. This was no coincidence as it is also the year<br />

where the first frame gabber for color video signal is available (Kohler, 2002). This<br />

marks the gradual shift of <strong>gesture</strong> <strong>recognition</strong> research from gray level video input to<br />

the color video input. The presence of color provides an additional and powerful cue<br />

for researchers especially in the area of segmentation.<br />

Presently, <strong>gesture</strong> <strong>recognition</strong> still remains an active area of research which is<br />

still in infancy. As different applications are put to the drawing board, more <strong>gesture</strong><br />

<strong>recognition</strong> techniques are still being improved, developed and created. This includes<br />

areas in segmentation, tracking, motion modeling, feature extraction and <strong>recognition</strong><br />

techniques. Since there is no single known technique which can cater for all the<br />

situations to be encountered, <strong>gesture</strong> <strong>recognition</strong> is highly problem orientated.


1.1.2 Web Camera<br />

RRPUSTAKAAN SULTAN ABDUL StW4fI<br />

UNiVERBCTl PUTRA MALAYW<br />

As explained in the Memam-Webster Online dictionary (2003). a <strong>web</strong><br />

<strong>camera</strong> or <strong>web</strong>cam is a <strong>camera</strong> used in transmitting live images over the World Wide<br />

Web. Generally speaking, any video <strong>camera</strong> or digital <strong>camera</strong> that can perform this<br />

function qualifies as a <strong>web</strong> <strong>camera</strong>. The term refers more specifically to the bction<br />

perform rather than the hardware itself as suggested by the definition.<br />

Over the years, as hosting live images in the World Wide Web becomes more<br />

and more popular, a class of <strong>camera</strong> which is dedicated to this fiurction mainly plus<br />

other additional features begins to emerge in the market and redefine the term, <strong>web</strong><br />

<strong>camera</strong>. It is a class of low cost <strong>camera</strong> usuilly in the range from above RMlOO to<br />

RM 400 which is connected to a PC through parallel port or USB port also known as<br />

pc <strong>camera</strong>.<br />

Web <strong>camera</strong> is a class of <strong>camera</strong> which either use Charge Couple Device<br />

(CCD) or Complex Metal Oxide Semiconductor (CMOS) as the photo sensor. It<br />

usually has optical resolution of not more than 640x480 pixels and can produce<br />

digital still image or video which can be directly stored inside a pc. Compared to the<br />

conventional CCD <strong>camera</strong> which still use analogue signal, <strong>web</strong> <strong>camera</strong> proves to be<br />

more convenient as no extra image capture card is needed to do the Analogue to<br />

Digital (AID) conversion. However, lower sensitivity and higher noise level are the<br />

two major drawbacks of <strong>web</strong> <strong>camera</strong> compared to higher quality CCD <strong>camera</strong> as<br />

justified by its lower cost.


Despite the shortcomings in the ability of <strong>web</strong> <strong>camera</strong>, it is still an attractive<br />

tool for performing <strong>gesture</strong> <strong>recognition</strong> especially when it is readily available in the<br />

market at a substantially lower price compared to other CCD <strong>camera</strong>. Web <strong>camera</strong> is<br />

still adequate to perform real-time <strong>gesture</strong> <strong>recognition</strong> as the key to its successful<br />

implementation lies not in the <strong>camera</strong> but in the processing speed of the computer<br />

carrying out the necessary manipulation of the image and the algorithm which<br />

determines how to manipulate the image effectively and efficiently.<br />

1.2 Objectives<br />

Vision based static hand <strong>gesture</strong> <strong>recognition</strong> is a complex task which is<br />

consisted of a series of structured steps. The aim of this thesis is to study the<br />

necessary steps of vision based static hand <strong>gesture</strong> <strong>recognition</strong> which will eventually<br />

lead to successful <strong>recognition</strong> of the human hand <strong>gesture</strong>s off line <strong>using</strong> <strong>web</strong> <strong>camera</strong>.<br />

The main objective of this thesis is to design a vision based static hand<br />

<strong>gesture</strong> <strong>recognition</strong> system <strong>using</strong> MATLAB which is able to perform <strong>recognition</strong> of<br />

six different sets of predefined hand postures.<br />

The secondary objective of this thesis is to study the techniques involved in<br />

various stages of hand <strong>gesture</strong> <strong>recognition</strong> task and to propose modification<br />

whenever necessary.


1.3 Organization<br />

The main issue to be addressed in this thesis is vision based static hand<br />

<strong>gesture</strong> <strong>recognition</strong>. Omine <strong>recognition</strong> of six different sets of hand posture is<br />

implemented <strong>using</strong> MATLAB 6.1 on a Pentium 111 667 with 256MB of memory. A<br />

<strong>web</strong> <strong>camera</strong> is employed as the imaging device for capturing different hand postures.<br />

The thesis is organized as follows:<br />

Chapter 2 lays a background on the whole subject of computer vision and<br />

machine vision. Detailed discussions are then given one <strong>gesture</strong> <strong>recognition</strong><br />

namely on vision based <strong>gesture</strong> <strong>recognition</strong> including facial <strong>gesture</strong>s and hand<br />

<strong>gesture</strong>s. Hand <strong>gesture</strong> <strong>recognition</strong> is then divided into two major areas which<br />

are static hand <strong>gesture</strong> <strong>recognition</strong> and dynamic hand <strong>gesture</strong> <strong>recognition</strong><br />

respectively. Major differences between the two areas are highlighted<br />

especially in terms of the different approaches adopted. Different techniques<br />

used for the various stages of the hand <strong>gesture</strong> <strong>recognition</strong> ranging from<br />

segmentation, features extraction and <strong>gesture</strong> <strong>recognition</strong> technique are then<br />

treated in details.<br />

Chapter 3 presents the methodology involved in constructing the vision based<br />

static hand <strong>gesture</strong> <strong>recognition</strong> system. Discussions however focus on the<br />

three most important rntijor steps of the system namely: (1)hand<br />

segmentation; (2)representation and description and (3)<strong>recognition</strong>. Hand<br />

segmentation technique is adopted to extract hand region from the visual<br />

input. Segmentation is done in gray level image, Red, Green, Blue (RGB)


color space and normalized RGB color space. A skin color modeling<br />

techniques <strong>using</strong> the property of human skin color in color space is then<br />

proposed. Representation and description on the other hand explains the<br />

methods used starting from preprocessing of the extracted hand region until a<br />

set of usehl features are extracted to represent the different hand postures.<br />

Recognition 5 presents the techniques used in classifying the feature vectors<br />

for the six different hand postures. Details on the steps used for implementing<br />

K-means clustering and the <strong>recognition</strong> method used are presented as well.<br />

Chapter 4 shows the results obtained from segmentation in different color<br />

space. The results for segmentation <strong>using</strong> the proposed skin color model are<br />

illustrated followed by the preprocessing, representation and eventually<br />

<strong>recognition</strong> based on the set of features proposed.<br />

Chapter 5 summarizes the thesis by restating the common problems arise in<br />

different stages of the <strong>gesture</strong> <strong>recognition</strong> task, thus laying down the<br />

requirements for a successful hand <strong>gesture</strong> <strong>recognition</strong> task as a whole and to<br />

check how much this thesis has helped to answer these requirements. For the<br />

questions and problems not addressed in other part of the thesis, they will be<br />

addressed in the future research directions.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!