01.11.2014 Views

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

192<br />

( ) ( )<br />

( ) H( x|<br />

y)<br />

( ) ( ) ( , )<br />

I( x, y) = H x −H y|<br />

x<br />

= H y −<br />

= H x + H y −H x y<br />

(8.34)<br />

Mutual information is a natural measure of the dependence between random variables. If the<br />

variables are independent, they provide no information on each other. An important property<br />

of mutual information is that one can define an invertible linear transformation<br />

y= Wx, s.t. I( y) = H( y) −H( x) − log detW<br />

9.3.4 Relation to mutual information<br />

Mutual information can be related to the notion of correlation through canonical correlation<br />

analysis. Since information is additive for statistically independent variables and since canonical<br />

variates are uncorrelated, the mutual information between two independent variate x and y is the<br />

sum of mutual information between all the canonical variates xi<br />

and y (i.e. the canonical<br />

i<br />

projections of x and y). For Gaussian variables this means:<br />

⎛⎛ ⎞⎞<br />

1 ⎜⎜ 1 ⎟⎟ 1 ⎛⎛ 1 ⎞⎞<br />

I( x; y)<br />

= log log<br />

2 2<br />

2<br />

⎜⎜<br />

( 1 ρ<br />

⎟⎟=<br />

∑ ⎜⎜ ⎟⎟<br />

⎜⎜∏<br />

− ) 2<br />

i ⎟⎟<br />

i ⎜⎜( 1−ρi<br />

) ⎟⎟<br />

⎝⎝<br />

⎝⎝ ⎠⎠<br />

i ⎠⎠<br />

(8.35)<br />

9.3.5 Kullback-Leibler Distance<br />

The most common method to measure the difference between two probability distributions p and<br />

q is to us the Kullback-Leibler distance (DKL), sometimes known as relative entropy:<br />

DKL is always positive ( )<br />

( )<br />

( )<br />

( )<br />

( )<br />

pi ⎛⎛ pi⎞⎞<br />

D( p|| q) = p( i)<br />

log = E<br />

log<br />

i qi ⎜⎜<br />

qi ⎟⎟<br />

⎝⎝ ⎠⎠<br />

D p|| q ≥ 0<br />

∑ (8.36)<br />

. It is zero if and only if p=q, i.e. if the variables are<br />

statistically independent. Note that in general the relative entropy or DKL is not symmetric under<br />

interchange of the distributions p and q: in general D( p|| q) ≠ D( q||<br />

p)<br />

. Hence, DKL is not<br />

strictly a distance. The measure of relative entropy is very important in pattern recognition and<br />

neural networks, as well as in information theory, as will be shown in other parts of this class.<br />

© A.G.Billard 2004 – Last Update March 2011

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!