MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
191<br />
1 1 1 1<br />
h x ∫ a b a b<br />
0<br />
a a<br />
∫ a<br />
b b<br />
a<br />
a+<br />
b<br />
( ) =− log − log = log + log = log ( ⋅ )<br />
This definition is appealing, as, if a or b have a large value, the spread of the distribution will be<br />
large.<br />
One can show that the Gaussian distribution is the distribution with maximal entropy for a given<br />
variance. In other words, this means that the gaussian distribution is the ``most random'' or the<br />
least structured of all distributions. Entropy is small for distributions that are clearly concentrated<br />
on certain values, i.e., when the variable is clearly clustered, or has a probability distribution<br />
function that is very ``spiky''.<br />
9.3.2 Joint and conditional entropy<br />
If x and y are two random variables taking values between [0, I] ∈• and [0, J]<br />
∈•<br />
respectively, and pi (, j ) the joint probability that x takes value I and y takes value j, then the<br />
entropy of the joint distribution is equal to:<br />
=−∑ (8.32)<br />
H( x, y) P( i, j)log P( i, j)<br />
i,<br />
j<br />
We can then derive the equation for the conditional entropy.<br />
∑<br />
H( x| y) =− P( j) H( x| y = j)<br />
j<br />
∑<br />
= − P( j) P( i| j)log P( i| j)<br />
j<br />
∑<br />
= − Pi ( , j)log Pi ( | j)<br />
i,<br />
j<br />
∑<br />
i<br />
Finally, conditional and joint entropy can be related as follows:<br />
( ) ( )<br />
Hxy (, ) = H x+ H yx |<br />
(8.33)<br />
Moreover, if the two variables are independent, then the entropy is additive:<br />
H( x, y) = H( x) + H( y) iff P( x, y) = P( x) P( y)<br />
9.3.3 Mutual Information<br />
The mutual information between two random variables x and y is denoted by<br />
© A.G.Billard 2004 – Last Update March 2011