01.11.2014 Views

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

11<br />

1.2.3 Key features for a good learning system<br />

There are a number of desiderata one could formulate for any given learning system, such as:<br />

Denoising: Working in a real world application implies noise. Noise can mean several things. It<br />

can be imprecision in the measurement (e.g. light affecting infra-red sensors). It can be side<br />

effects from the experimental setups (e.g. systematic bias in the initial conditions). Eliminating<br />

noise is probably the most crucial processing one might want to perform on the data, prior to any<br />

other processing. However, it has its costs. Denoising requires noise model. It is seldom the case<br />

that the model is known. There are methods to learn these models, but the process is slow. An<br />

incorrect noise model is bound to eliminate good data with the noise, or get rid of too little noise.<br />

Decorrelating: Decorrelating the data is often a first step before denoising or proceeding to any<br />

feature extraction. The process of decorrelation aims at making sure that, prior to analyzing the<br />

data, you have found a way to represent the data that explicitly encapsulate the correlations. Data<br />

with low correlations can often be considered as noise, or of little relevance to modeling the<br />

process.<br />

Generalization versus memorization: A general and important feature of a learning system that<br />

differentiates it from a pure “memory” is its ability to generalize. Generalizing consists of<br />

extracting key features from the data, matching those across data (to find resemblances) and<br />

storing a generalized representation of the data features that account best (according to a given<br />

metric) for all the small differences across data. Classification and clustering techniques are<br />

examples of methods that generalize through a categorization of the data. Generalizing is the<br />

opposite of memorizing and often one might want to find a tradeoff between over-generalizing,<br />

hence losing information on the data, and over fitting, i.e. keeping more information than required.<br />

Generalization is particularly important in order to reduce the influence of noise, introduced in the<br />

variability of the data.<br />

Feature extraction Part of the process of learning consists in extracting what is relevant in a set<br />

of data. This is often done by finding what is common across all the examples used for learning.<br />

The common feature to all data may consist of complex patterns (e.g. when training an algorithm<br />

to recognize faces, you may expect the algorithm to learn to extract features such as nose, eyes,<br />

mouth, etc). The process of finding what is common relies usually on a measure of correlation<br />

across data points. Extracting correlation can often reduce the dimensionality of the dataset at<br />

hand, by saving only a subpart of all the information.<br />

Curse of dimensionality A problem which one often encounters in machine learning is what one<br />

calls the curse of dimensionality, whereby the number of samples used for training should grow<br />

exponentially with the dimensionality of your data set to remain representative. This exponential<br />

increase is reflected back into an exponential increase of the computation steps for training. Both<br />

of these are usually not practical and, as we will see in this class, many algorithms have been<br />

developed to reduce the dimensionality of the data prior to further processing.<br />

© A.G.Billard 2004 – Last Update March 2011

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!