MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
192<br />
( ) ( )<br />
( ) H( x|<br />
y)<br />
( ) ( ) ( , )<br />
I( x, y) = H x −H y|<br />
x<br />
= H y −<br />
= H x + H y −H x y<br />
(8.34)<br />
Mutual information is a natural measure of the dependence between random variables. If the<br />
variables are independent, they provide no information on each other. An important property<br />
of mutual information is that one can define an invertible linear transformation<br />
y= Wx, s.t. I( y) = H( y) −H( x) − log detW<br />
9.3.4 Relation to mutual information<br />
Mutual information can be related to the notion of correlation through canonical correlation<br />
analysis. Since information is additive for statistically independent variables and since canonical<br />
variates are uncorrelated, the mutual information between two independent variate x and y is the<br />
sum of mutual information between all the canonical variates xi<br />
and y (i.e. the canonical<br />
i<br />
projections of x and y). For Gaussian variables this means:<br />
⎛⎛ ⎞⎞<br />
1 ⎜⎜ 1 ⎟⎟ 1 ⎛⎛ 1 ⎞⎞<br />
I( x; y)<br />
= log log<br />
2 2<br />
2<br />
⎜⎜<br />
( 1 ρ<br />
⎟⎟=<br />
∑ ⎜⎜ ⎟⎟<br />
⎜⎜∏<br />
− ) 2<br />
i ⎟⎟<br />
i ⎜⎜( 1−ρi<br />
) ⎟⎟<br />
⎝⎝<br />
⎝⎝ ⎠⎠<br />
i ⎠⎠<br />
(8.35)<br />
9.3.5 Kullback-Leibler Distance<br />
The most common method to measure the difference between two probability distributions p and<br />
q is to us the Kullback-Leibler distance (DKL), sometimes known as relative entropy:<br />
DKL is always positive ( )<br />
( )<br />
( )<br />
( )<br />
( )<br />
pi ⎛⎛ pi⎞⎞<br />
D( p|| q) = p( i)<br />
log = E<br />
log<br />
i qi ⎜⎜<br />
qi ⎟⎟<br />
⎝⎝ ⎠⎠<br />
D p|| q ≥ 0<br />
∑ (8.36)<br />
. It is zero if and only if p=q, i.e. if the variables are<br />
statistically independent. Note that in general the relative entropy or DKL is not symmetric under<br />
interchange of the distributions p and q: in general D( p|| q) ≠ D( q||<br />
p)<br />
. Hence, DKL is not<br />
strictly a distance. The measure of relative entropy is very important in pattern recognition and<br />
neural networks, as well as in information theory, as will be shown in other parts of this class.<br />
© A.G.Billard 2004 – Last Update March 2011