MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
11<br />
1.2.3 Key features for a good learning system<br />
There are a number of desiderata one could formulate for any given learning system, such as:<br />
Denoising: Working in a real world application implies noise. Noise can mean several things. It<br />
can be imprecision in the measurement (e.g. light affecting infra-red sensors). It can be side<br />
effects from the experimental setups (e.g. systematic bias in the initial conditions). Eliminating<br />
noise is probably the most crucial processing one might want to perform on the data, prior to any<br />
other processing. However, it has its costs. Denoising requires noise model. It is seldom the case<br />
that the model is known. There are methods to learn these models, but the process is slow. An<br />
incorrect noise model is bound to eliminate good data with the noise, or get rid of too little noise.<br />
Decorrelating: Decorrelating the data is often a first step before denoising or proceeding to any<br />
feature extraction. The process of decorrelation aims at making sure that, prior to analyzing the<br />
data, you have found a way to represent the data that explicitly encapsulate the correlations. Data<br />
with low correlations can often be considered as noise, or of little relevance to modeling the<br />
process.<br />
Generalization versus memorization: A general and important feature of a learning system that<br />
differentiates it from a pure “memory” is its ability to generalize. Generalizing consists of<br />
extracting key features from the data, matching those across data (to find resemblances) and<br />
storing a generalized representation of the data features that account best (according to a given<br />
metric) for all the small differences across data. Classification and clustering techniques are<br />
examples of methods that generalize through a categorization of the data. Generalizing is the<br />
opposite of memorizing and often one might want to find a tradeoff between over-generalizing,<br />
hence losing information on the data, and over fitting, i.e. keeping more information than required.<br />
Generalization is particularly important in order to reduce the influence of noise, introduced in the<br />
variability of the data.<br />
Feature extraction Part of the process of learning consists in extracting what is relevant in a set<br />
of data. This is often done by finding what is common across all the examples used for learning.<br />
The common feature to all data may consist of complex patterns (e.g. when training an algorithm<br />
to recognize faces, you may expect the algorithm to learn to extract features such as nose, eyes,<br />
mouth, etc). The process of finding what is common relies usually on a measure of correlation<br />
across data points. Extracting correlation can often reduce the dimensionality of the dataset at<br />
hand, by saving only a subpart of all the information.<br />
Curse of dimensionality A problem which one often encounters in machine learning is what one<br />
calls the curse of dimensionality, whereby the number of samples used for training should grow<br />
exponentially with the dimensionality of your data set to remain representative. This exponential<br />
increase is reflected back into an exponential increase of the computation steps for training. Both<br />
of these are usually not practical and, as we will see in this class, many algorithms have been<br />
developed to reduce the dimensionality of the data prior to further processing.<br />
© A.G.Billard 2004 – Last Update March 2011