MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
58<br />
When classification results from a linear projection through a map A, as in LDA, then again the<br />
problem corresponds to maximizing J( A)<br />
.<br />
T<br />
ASA<br />
b<br />
= with S ,<br />
w<br />
T<br />
ASA<br />
w<br />
Sb<br />
defined as above.<br />
Figure 3-17: Elongated data from two classes: the black samples are spread more widely than<br />
the white ones (left), the direction of maximum variance found by PCA is not optimal for<br />
separating the two classes. LDA is able to find a better projection but it still does not allow linear<br />
separation. Fisher Linear Discriminant takes into account the difference in variance of the two<br />
distributions and finds a separable projection. [DEMOS\PROJECTION\PCA-LDA-FISHER.ML]<br />
3.2.3 Mixture of linear classifiers (boosting and bagging)<br />
Adapted from Bagging Predictors by Leo Breiman, Machine Learning, 24, 123–140 (1996)<br />
Linear classifiers as described previously rely on regression and can perform only binary<br />
classification. To extend the domain of application of classifiers to allow classification of more<br />
than two classes, linear classifiers can be combined. A number of combining techniques can be<br />
used, among which bagging and boosting have appeared in recent years as the most popular<br />
methods due to their simplicity. All of these methods modify the training data set, build classifiers<br />
on these modified training sets, and then combine them into a final decision rule by simple or<br />
weighted majority voting. However, they perform in a different way.<br />
We briefly review bagging and boosting next. Other multi-class classification methods which we<br />
will see in this course are the multi-layer perceptron and multi-class support vector machines.<br />
3.2.3.1 Bagging<br />
Let us assume, once more, that the complete training set is composed of M N-dimensional data<br />
M i N<br />
points X { x} 1<br />
, x ,<br />
i<br />
= ∈° with associated labels { } ,<br />
1<br />
i=<br />
M<br />
i<br />
= ∈• . The labels are not<br />
i=<br />
Y y y<br />
necessarily binary and hence this allows multi-class classification.<br />
Bagging consists in selecting (drawing with replacement) at random K subsets of training<br />
data<br />
k<br />
X k 1,...<br />
K<br />
{ }<br />
k=<br />
1<br />
k k<br />
= K . Each of these subsets will be used to create a classifier C ( X )<br />
. Each<br />
k<br />
classifier performs a mapping C : X → Y. The final classifier is built from all classifiers and is<br />
such that it outputs the class predicted most often by all K classifiers:<br />
© A.G.Billard 2004 – Last Update March 2011