MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
127<br />
6.3.2 Information Theory and the Neuron<br />
A good start to understand the properties of ANN is to look at those from the point of view of<br />
information theory. Linsker applied information theory to the analysis of the processing of a single<br />
neuron. He started by defining the infomax principle, which states that: the learning rule should be<br />
such that the joint information between the inputs and output of the neuron is maximal.<br />
As mentioned in the introduction, a fundamental principle of learning systems is their robustness<br />
to noise. In a single neuron model, one way to measure the neuron’s robustness to noise is to<br />
determine the joint information between its inputs and output.<br />
Let us consider a neuron with noise on the output:<br />
y ⎛⎛ ⎞⎞<br />
= ⎜⎜ w x + ν<br />
⎝⎝ ⎠⎠<br />
∑ j j⎟⎟<br />
(6.3)<br />
j<br />
Let us assume that the output, y, and the noise, ν follow a Gaussian distribution with zero-mean<br />
and variance σ and σ respectively. Let us further assume that the noise is uncorrelated with<br />
2<br />
y<br />
any of the inputs, i.e.<br />
2<br />
ν<br />
E( ν x i ) = 0 ∀ i<br />
(6.4)<br />
We can now compute the mutual information between the neuron’s input and output:<br />
( , ) ( ) ( | )<br />
where h( z ) is the entropy of z , see Section 9.3.1.<br />
I x y = h y − h y x<br />
(6.5)<br />
In other words, I(x,y) measures the information contained in the output about the input and is<br />
equal to the information in the output minus the uncertainty in the output, when knowing the value<br />
of the input.<br />
Since the neuron’s activation function is deterministic, the uncertainty on the output is solely due<br />
h y|<br />
x = h ν . Taking into account the Gaussian properties of<br />
to the noise. Hence, we have: ( ) ( )<br />
the output and the noise, we have;<br />
( , ) = ( ) − ( )<br />
I x y h y h ν<br />
=<br />
1 log<br />
2<br />
σ<br />
σ<br />
2<br />
y<br />
2<br />
v<br />
(6.6)<br />
σ<br />
σ<br />
2<br />
y<br />
2<br />
v<br />
stands for the signal/noise ratio of the neuron. Since the experimental set-up fixes the<br />
amount and variance of noise, one can modulate only the variance of the output. In order to<br />
improve the signal/noise ratio, and, thus, the amount of information, one can, thus, increase the<br />
© A.G.Billard 2004 – Last Update March 2011