01.11.2014 Views

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

56<br />

3.2 Linear Classifiers<br />

In this section, we will consider solely ways to provide a linear classification of data. Non-linear<br />

methods for classification such as ANN with backpropagation and Support Vector Machine will be<br />

covered later on, in these lecture notes.<br />

Linear Discriminant Analysis and the related Fisher's linear discriminant are methods to find the<br />

linear combination of features (projections of the data) that best separate two or more classes of<br />

objects. The resulting combination may be used as a linear classifier. We describe these next.<br />

3.2.1 Linear Discriminant Analysis<br />

Linear Discriminant Analysis (LDA) combines concepts of PCA and clustering to determine<br />

projections along which a dataset can be best separated into two distinct classes.<br />

M×<br />

N<br />

Consider a data matrix composed of M N-dimensional data points, i.e. X ∈° . LDA aims to<br />

M×<br />

P<br />

i<br />

find a linear transformation A∈° that maps each column x of X, for i= 1,... M (these are N-<br />

dimensional vectors) into a corresponding q-dimensional vector y<br />

i<br />

. That is,<br />

( )<br />

i i i<br />

: N<br />

q<br />

A x ∈ → y = Ax ∈ q≤N<br />

° ° (3.22)<br />

Let us further assume that the data in X is partitioned into K classes{ } , where each class<br />

k = 1<br />

k<br />

C contains<br />

K<br />

k<br />

n data points and<br />

∑<br />

k = 1<br />

n<br />

k<br />

= M<br />

. LDA aims to find the optimal transformation A such<br />

that the class structure of the original high-dimensional space is preserved in the low-dimensional<br />

space.<br />

In general, if each class is tightly grouped, but well separated from the other classes, the quality<br />

of the cluster is considered to be high. In discriminant analysis, two scatter matrices, called<br />

within-class ( S ) and between-class ( S<br />

w<br />

b<br />

) matrices, are defined to quantify the quality of the<br />

clusters as follows:<br />

where<br />

k 1 1<br />

µ = , µ =<br />

n<br />

N<br />

w<br />

K<br />

i k i k<br />

∑∑( µ )( µ )<br />

S = x − x −<br />

S<br />

∑ ∑∑<br />

k = 1 x ∈C<br />

k<br />

k<br />

x∈C x∈C<br />

k = 1<br />

i<br />

k<br />

( µ µ )( µ µ )<br />

K<br />

k<br />

b<br />

= ∑ n<br />

k<br />

−<br />

k<br />

−<br />

k = 1<br />

K<br />

k<br />

T<br />

T<br />

C<br />

k<br />

K<br />

(3.23)<br />

are, respectively, the mean of the k-th class and the<br />

global mean. An implicit assumption of LDA is that all classes have equal class covariance<br />

(otherwise, the elements of the within-class matrix should be normalized by the covariance on the<br />

set of data points of that class).<br />

© A.G.Billard 2004 – Last Update March 2011

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!