MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
16<br />
2 Methods for Correlation Analysis PCA, CCA, ICA<br />
Principal Component Analysis (PCA), Canonical Correlation Analysis (CCA) and Independent<br />
Component Analysis (ICA) are techniques for<br />
• Discovering or reducing the dimensionality of a multidimensional data set<br />
• Identifying a suitable representation of the multivariate data, by de-correlating the<br />
dataset.<br />
The principle of these techniques consists in determining the directions of the multidimensional<br />
space of the dataset, along which the variability of the data is maximal. These directions<br />
correspond to the principal axes or Eigenvectors of the correlation matrix applied to the whole<br />
dataset. By projecting the data onto the referential defined by the Eigenvectors, one obtains a<br />
representation of the data that minimizes the statistical dependence across the data.<br />
Dimensionality reduction is obtained by discarding the dimensions along which the variance<br />
appears to be smaller than a criterion; the data appear to be quasi constant along these<br />
dimensions.<br />
Identifying the best suitable representation of a dataset is fundamental as it can simplify<br />
enormously the search for a solution. Consider the example of Figure 2-1 left. It is clear to a<br />
human viewer that the data align themselves along an ellipse. However, an algorithm would have<br />
trouble finding the regularities underlying the dataset, if the data coordinates are given with<br />
respect to an external coordinate frame. On the other hand, the task becomes much simpler, if<br />
the data are transferred to a coordinate frame alongside the axes of the ellipse, as illustrated in<br />
Figure 2-1 right.<br />
Figure 2-1: The two Eigenvectors determine the axes of an ellipse. The Eigenvalues determine the length of<br />
the axes of an ellipse that fits best the data.<br />
© A.G.Billard 2004 – Last Update March 2011