01.11.2014 Views

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

127<br />

6.3.2 Information Theory and the Neuron<br />

A good start to understand the properties of ANN is to look at those from the point of view of<br />

information theory. Linsker applied information theory to the analysis of the processing of a single<br />

neuron. He started by defining the infomax principle, which states that: the learning rule should be<br />

such that the joint information between the inputs and output of the neuron is maximal.<br />

As mentioned in the introduction, a fundamental principle of learning systems is their robustness<br />

to noise. In a single neuron model, one way to measure the neuron’s robustness to noise is to<br />

determine the joint information between its inputs and output.<br />

Let us consider a neuron with noise on the output:<br />

y ⎛⎛ ⎞⎞<br />

= ⎜⎜ w x + ν<br />

⎝⎝ ⎠⎠<br />

∑ j j⎟⎟<br />

(6.3)<br />

j<br />

Let us assume that the output, y, and the noise, ν follow a Gaussian distribution with zero-mean<br />

and variance σ and σ respectively. Let us further assume that the noise is uncorrelated with<br />

2<br />

y<br />

any of the inputs, i.e.<br />

2<br />

ν<br />

E( ν x i ) = 0 ∀ i<br />

(6.4)<br />

We can now compute the mutual information between the neuron’s input and output:<br />

( , ) ( ) ( | )<br />

where h( z ) is the entropy of z , see Section 9.3.1.<br />

I x y = h y − h y x<br />

(6.5)<br />

In other words, I(x,y) measures the information contained in the output about the input and is<br />

equal to the information in the output minus the uncertainty in the output, when knowing the value<br />

of the input.<br />

Since the neuron’s activation function is deterministic, the uncertainty on the output is solely due<br />

h y|<br />

x = h ν . Taking into account the Gaussian properties of<br />

to the noise. Hence, we have: ( ) ( )<br />

the output and the noise, we have;<br />

( , ) = ( ) − ( )<br />

I x y h y h ν<br />

=<br />

1 log<br />

2<br />

σ<br />

σ<br />

2<br />

y<br />

2<br />

v<br />

(6.6)<br />

σ<br />

σ<br />

2<br />

y<br />

2<br />

v<br />

stands for the signal/noise ratio of the neuron. Since the experimental set-up fixes the<br />

amount and variance of noise, one can modulate only the variance of the output. In order to<br />

improve the signal/noise ratio, and, thus, the amount of information, one can, thus, increase the<br />

© A.G.Billard 2004 – Last Update March 2011

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!