MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
56<br />
3.2 Linear Classifiers<br />
In this section, we will consider solely ways to provide a linear classification of data. Non-linear<br />
methods for classification such as ANN with backpropagation and Support Vector Machine will be<br />
covered later on, in these lecture notes.<br />
Linear Discriminant Analysis and the related Fisher's linear discriminant are methods to find the<br />
linear combination of features (projections of the data) that best separate two or more classes of<br />
objects. The resulting combination may be used as a linear classifier. We describe these next.<br />
3.2.1 Linear Discriminant Analysis<br />
Linear Discriminant Analysis (LDA) combines concepts of PCA and clustering to determine<br />
projections along which a dataset can be best separated into two distinct classes.<br />
M×<br />
N<br />
Consider a data matrix composed of M N-dimensional data points, i.e. X ∈° . LDA aims to<br />
M×<br />
P<br />
i<br />
find a linear transformation A∈° that maps each column x of X, for i= 1,... M (these are N-<br />
dimensional vectors) into a corresponding q-dimensional vector y<br />
i<br />
. That is,<br />
( )<br />
i i i<br />
: N<br />
q<br />
A x ∈ → y = Ax ∈ q≤N<br />
° ° (3.22)<br />
Let us further assume that the data in X is partitioned into K classes{ } , where each class<br />
k = 1<br />
k<br />
C contains<br />
K<br />
k<br />
n data points and<br />
∑<br />
k = 1<br />
n<br />
k<br />
= M<br />
. LDA aims to find the optimal transformation A such<br />
that the class structure of the original high-dimensional space is preserved in the low-dimensional<br />
space.<br />
In general, if each class is tightly grouped, but well separated from the other classes, the quality<br />
of the cluster is considered to be high. In discriminant analysis, two scatter matrices, called<br />
within-class ( S ) and between-class ( S<br />
w<br />
b<br />
) matrices, are defined to quantify the quality of the<br />
clusters as follows:<br />
where<br />
k 1 1<br />
µ = , µ =<br />
n<br />
N<br />
w<br />
K<br />
i k i k<br />
∑∑( µ )( µ )<br />
S = x − x −<br />
S<br />
∑ ∑∑<br />
k = 1 x ∈C<br />
k<br />
k<br />
x∈C x∈C<br />
k = 1<br />
i<br />
k<br />
( µ µ )( µ µ )<br />
K<br />
k<br />
b<br />
= ∑ n<br />
k<br />
−<br />
k<br />
−<br />
k = 1<br />
K<br />
k<br />
T<br />
T<br />
C<br />
k<br />
K<br />
(3.23)<br />
are, respectively, the mean of the k-th class and the<br />
global mean. An implicit assumption of LDA is that all classes have equal class covariance<br />
(otherwise, the elements of the within-class matrix should be normalized by the covariance on the<br />
set of data points of that class).<br />
© A.G.Billard 2004 – Last Update March 2011