MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA MACHINE LEARNING TECHNIQUES - LASA

01.11.2014 Views

128 variance of the output, e.g. by increasing the weights. Note that, by doing this, you increase the unpredictability of the neuron’s output, which can have disastrous consequences in some applications. Noise on the Inputs In most applications, you will have to face noise in the input, with different type of noise depending on the input. In such a scenario, the output of a neuron would be described by the following: ( ) ∑ j j j (6.7) j y = w x + ν One can show that the mutual information between input and output in this case becomes: 2 1 σ y I( x, y) = log 2 ⎛⎛ ⎞⎞ 2 σ v ⎜⎜∑ wj⎟⎟ ⎝⎝ j ⎠⎠ 2 (6.8) In this case, it is not sufficient to just increase the weights, since, by doing so, one will also increase the amount on the denominator. More sophisticated techniques must be used on a neuron-by-neuron basis. More than one output neuron Imagine a two input-output scenario in which the two outputs attempt to jointly convey as much information as possible on the two inputs. In this case, each output’s neuron’s activation is given by: ( ) ∑ (6.9) y = w x + ν i ij j i j Similarly to the 1-output case, we can assume that the noise terms are uncorrelated and Gaussian and we can write: 2 ( ν) = ( ν1, ν2) = ( ν1) + ( ν2) = 1+ 2log( 2πσ ) h h h h ν Since the output neurons are both dependent on the same two inputs, they are correlated. One can calculate the correlation matrix R as: T ⎛⎛y1 ⎞⎞ ⎛⎛r11 r12 ⎞⎞ R= E( yy ) = ⎜⎜ ⎟⎟( y1 y2) = ⎜⎜ ⎟⎟ y r r ⎝⎝ 2⎠⎠ ⎝⎝ 21 22⎠⎠ (6.10) One can show that the mutual information is equal to © A.G.Billard 2004 – Last Update March 2011

129 ( R) ⎛⎛det ⎞⎞ I( x, y) = log⎜⎜ 2 ⎟⎟ ⎝⎝ σ ν ⎠⎠ (6.11) 2 Again, the variance of the noise σ ν is fixed and, so, to maximize mutual information, we must maximize 4 2 2 2 2 2 2 ( R) = r11r22 − r12r21 = σν + σν ( σ1 + σ2 ) + σ1 σ2 ( − ρ12 ) det 2 1 σ 2 , i = 1,2 1 is the variance of each output neuron in the absence of noise and 12 correlation coefficient of the output signals also in absence of noise. One can thus consider the two following situations: ρ is the Large noise variance : If σ is large, one can ignore the 3 rd term of the equation. It remains, thus, to maximize the sum ν 2 2 ( σ1 σ2) + , which can be done by maximizing the variance of either neuron independently. Low noise variance: If, on the contrary, the variance is very small, the 3 rd term becomes more important than the two first ones. In that case, one must find a tradeoff between maximizing the variance on each output neuron while keeping the correlation factor sufficiently small. In other words, in a low noise situation, it is best to use a network in which each neuron’s output is de-correlated from one another, i.e. where each output neuron conveys a different information about the inputs; while in a high noise situation, it is best to have a high redundancy in the output. This way one gives more chances for the information conveyed in the input to be appropriately transferred to the output. 6.4 The Backpropagation Learning Rule In Section 6.3.1, we have seen the perceptron learning rule. Here, we will see a general supervised learning rule for a multi-layer perceptron neural network, called Backpropagation. Backpropagation is part of objective functions, a class of functions that minimize an error as the criterion for optimization. Such functions are said to perform a gradient descent type of optimization, see Section 9.4.1. Error descent methods are usually associated with supervised learning methods, in which we must provide the network with a set of example data and the answer we expect the network to give is presented together with the data. © A.G.Billard 2004 – Last Update March 2011

129<br />

( R)<br />

⎛⎛det<br />

⎞⎞<br />

I( x, y)<br />

= log⎜⎜ 2 ⎟⎟<br />

⎝⎝ σ ν ⎠⎠<br />

(6.11)<br />

2<br />

Again, the variance of the noise σ<br />

ν<br />

is fixed and, so, to maximize mutual information, we must<br />

maximize<br />

4 2 2 2 2 2 2<br />

( R) = r11r22 − r12r21 = σν + σν<br />

( σ1 + σ2 ) + σ1 σ2 ( − ρ12<br />

)<br />

det 2 1<br />

σ 2<br />

, i = 1,2<br />

1 is the variance of each output neuron in the absence of noise and 12<br />

correlation coefficient of the output signals also in absence of noise.<br />

One can thus consider the two following situations:<br />

ρ is the<br />

Large noise variance :<br />

If σ is large, one can ignore the 3 rd term of the equation. It remains, thus, to maximize the sum<br />

ν<br />

2 2<br />

( σ1 σ2)<br />

+ , which can be done by maximizing the variance of either neuron independently.<br />

Low noise variance:<br />

If, on the contrary, the variance is very small, the 3 rd term becomes more important than the two<br />

first ones. In that case, one must find a tradeoff between maximizing the variance on each output<br />

neuron while keeping the correlation factor sufficiently small.<br />

In other words, in a low noise situation, it is best to use a network in which each neuron’s output<br />

is de-correlated from one another, i.e. where each output neuron conveys a different information<br />

about the inputs; while in a high noise situation, it is best to have a high redundancy in the output.<br />

This way one gives more chances for the information conveyed in the input to be appropriately<br />

transferred to the output.<br />

6.4 The Backpropagation Learning Rule<br />

In Section 6.3.1, we have seen the perceptron learning rule. Here, we will see a general<br />

supervised learning rule for a multi-layer perceptron neural network, called Backpropagation.<br />

Backpropagation is part of objective functions, a class of functions that minimize an error as the<br />

criterion for optimization. Such functions are said to perform a gradient descent type of<br />

optimization, see Section 9.4.1.<br />

Error descent methods are usually associated with supervised learning methods, in which we<br />

must provide the network with a set of example data and the answer we expect the network to<br />

give is presented together with the data.<br />

© A.G.Billard 2004 – Last Update March 2011

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!