MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
195<br />
9.4.2.1 Maximum Likelihood<br />
Machine learning techniques often assume that the form of the distribution function is known and<br />
that sole its parameters must be optimized to fit at best a set of observed datapoints. It then<br />
proceeds to determine these parameters through maximum likelihood optimization.<br />
The principle of maximum likelihood consists of finding the optimal parameters of a given<br />
distribution by maximizing the likelihood function of these parameters, equivalently by maximizing<br />
the probability of the data given the model and its parameters, e.g.:<br />
The method of maximum likelihood is a method of point estimation that uses as an estimate of<br />
an unobservable population parameter the member of the parameter space that maximizes the<br />
likelihood function. If y is the unobserved parameter and X the observed instance of this<br />
parameter, the probability of an observed outcome X=x given the underlying distribution µ is:<br />
( ) ( | )<br />
L y P X x µ<br />
X= x<br />
= = (8.38)<br />
the value of y that maximizes L ( µ ) is the maximum-likelihood estimate of µ . In order to find<br />
∂ L =<br />
∂µ<br />
the maximum, one will compute the derivative of L: 0 . However, it is often much simpler to<br />
compute the derivative of the logarithm of the likelihood function, the log-likelihood.<br />
Let X = { x1, x2,..., x N<br />
} be the dataset of observed instance of the variable X generated by a<br />
distribution parameterized by the unknown µ , i.e., p( x | µ ). µ can be estimated by Maximum-<br />
Likelihood:<br />
( ) p( x )<br />
ˆ µ arg max[ p X | µ | µ<br />
N<br />
= =∏ (8.39)<br />
µ<br />
In most problems, it is not possible to find an analytical expression for ˆµ . One, then, has to<br />
proceed to an estimation of the parameter through an iterative procedure called EM.<br />
i=<br />
1<br />
i<br />
9.4.2.2 EM-Algorithm<br />
EM is an algorithm for finding maximum likelihood estimates of parameters in probabilistic<br />
models, where the model depends on unobserved latent variables. EM alternates between<br />
performing an expectation (E) step, which computes the expected value of the latent variables,<br />
and a maximization (M) step, which computes the maximum likelihood estimates of the<br />
parameters given the data and setting the latent variables to their expectation.<br />
A basic intuition of the EM-algorithm goes as follows:<br />
• Guess an initial ˆµ . (Initialization)<br />
• Using current ˆµ , obtain an expectation of the complete data likelihood L ( µ ). (E-step)<br />
• Find (and update) ˆµ to maximize the expectation (M-step).<br />
© A.G.Billard 2004 – Last Update March 2011