01.11.2014 Views

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

16<br />

2 Methods for Correlation Analysis PCA, CCA, ICA<br />

Principal Component Analysis (PCA), Canonical Correlation Analysis (CCA) and Independent<br />

Component Analysis (ICA) are techniques for<br />

• Discovering or reducing the dimensionality of a multidimensional data set<br />

• Identifying a suitable representation of the multivariate data, by de-correlating the<br />

dataset.<br />

The principle of these techniques consists in determining the directions of the multidimensional<br />

space of the dataset, along which the variability of the data is maximal. These directions<br />

correspond to the principal axes or Eigenvectors of the correlation matrix applied to the whole<br />

dataset. By projecting the data onto the referential defined by the Eigenvectors, one obtains a<br />

representation of the data that minimizes the statistical dependence across the data.<br />

Dimensionality reduction is obtained by discarding the dimensions along which the variance<br />

appears to be smaller than a criterion; the data appear to be quasi constant along these<br />

dimensions.<br />

Identifying the best suitable representation of a dataset is fundamental as it can simplify<br />

enormously the search for a solution. Consider the example of Figure 2-1 left. It is clear to a<br />

human viewer that the data align themselves along an ellipse. However, an algorithm would have<br />

trouble finding the regularities underlying the dataset, if the data coordinates are given with<br />

respect to an external coordinate frame. On the other hand, the task becomes much simpler, if<br />

the data are transferred to a coordinate frame alongside the axes of the ellipse, as illustrated in<br />

Figure 2-1 right.<br />

Figure 2-1: The two Eigenvectors determine the axes of an ellipse. The Eigenvalues determine the length of<br />

the axes of an ellipse that fits best the data.<br />

© A.G.Billard 2004 – Last Update March 2011

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!