01.11.2014 Views

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

58<br />

When classification results from a linear projection through a map A, as in LDA, then again the<br />

problem corresponds to maximizing J( A)<br />

.<br />

T<br />

ASA<br />

b<br />

= with S ,<br />

w<br />

T<br />

ASA<br />

w<br />

Sb<br />

defined as above.<br />

Figure 3-17: Elongated data from two classes: the black samples are spread more widely than<br />

the white ones (left), the direction of maximum variance found by PCA is not optimal for<br />

separating the two classes. LDA is able to find a better projection but it still does not allow linear<br />

separation. Fisher Linear Discriminant takes into account the difference in variance of the two<br />

distributions and finds a separable projection. [DEMOS\PROJECTION\PCA-LDA-FISHER.ML]<br />

3.2.3 Mixture of linear classifiers (boosting and bagging)<br />

Adapted from Bagging Predictors by Leo Breiman, Machine Learning, 24, 123–140 (1996)<br />

Linear classifiers as described previously rely on regression and can perform only binary<br />

classification. To extend the domain of application of classifiers to allow classification of more<br />

than two classes, linear classifiers can be combined. A number of combining techniques can be<br />

used, among which bagging and boosting have appeared in recent years as the most popular<br />

methods due to their simplicity. All of these methods modify the training data set, build classifiers<br />

on these modified training sets, and then combine them into a final decision rule by simple or<br />

weighted majority voting. However, they perform in a different way.<br />

We briefly review bagging and boosting next. Other multi-class classification methods which we<br />

will see in this course are the multi-layer perceptron and multi-class support vector machines.<br />

3.2.3.1 Bagging<br />

Let us assume, once more, that the complete training set is composed of M N-dimensional data<br />

M i N<br />

points X { x} 1<br />

, x ,<br />

i<br />

= ∈° with associated labels { } ,<br />

1<br />

i=<br />

M<br />

i<br />

= ∈• . The labels are not<br />

i=<br />

Y y y<br />

necessarily binary and hence this allows multi-class classification.<br />

Bagging consists in selecting (drawing with replacement) at random K subsets of training<br />

data<br />

k<br />

X k 1,...<br />

K<br />

{ }<br />

k=<br />

1<br />

k k<br />

= K . Each of these subsets will be used to create a classifier C ( X )<br />

. Each<br />

k<br />

classifier performs a mapping C : X → Y. The final classifier is built from all classifiers and is<br />

such that it outputs the class predicted most often by all K classifiers:<br />

© A.G.Billard 2004 – Last Update March 2011

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!