MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
166<br />
Finally, the emission probabilities in the discrete case are given by computing the expected<br />
number of times one will be in each particular state and that the observation took a particular<br />
value v k<br />
, i.e. ot<br />
= v :<br />
k<br />
bˆ v<br />
tot<br />
= k<br />
( )<br />
{: v }<br />
i k T<br />
= ∑ ∑ =<br />
t 1<br />
γ () i<br />
t<br />
γ () i<br />
t<br />
(7.7)<br />
In the continuous case, one builds a continuous estimate of the b ˆi , e.g. by building a GMM for<br />
each state.<br />
The ^ symbol in each of the above designates the new estimate of the values for each of the<br />
HMM parameter.<br />
7.2.3 Determining the number of states<br />
A crucial parameter to HMM (and similarly to GMM) is the number of states K. Increasing K<br />
increases the number of parameters and so computation. Of course, increasing the number of<br />
states allows a finer description of the density and hence increases the likelihood of the model<br />
given the data. This increase may however be negligeable in contrast to the increase in<br />
computation time due to the increase of parameters. One must then find a tradeoff between<br />
increasing the number of parameters and improving the likelihood of the model. This is<br />
encapsulated in the Aikaike Information Criterion (AIC):<br />
where L is the likelihood of the model.<br />
AIC= − 2ln L+ 2K<br />
(7.8)<br />
This criterion was however lately replaced by the Bayesian Information Criterion that also weights<br />
the fact that the estimate of the number of parameters depends on the number of data points.<br />
Sparsity in the data may require a lot of parameters to encapsulate non-linearities in these. As the<br />
number of datapoints M increases, so does the influence of the number of parameters.<br />
BIC =− 2lnL + K ln( M )<br />
(7.9)<br />
An alternative to the above two criteria is the Deviation Information Criterion (DIC)<br />
{ ( )} { }<br />
( ) ( ) ( )<br />
DIC = E D K - D E K with D K 2 ln p X | K<br />
=− (7.10)<br />
Again the smaller the DIC, the better the model fits the data. DIC favors a good fit, while<br />
penalizing models by measuring the effective number of parameters required for a similar level of<br />
fit.<br />
© A.G.Billard 2004 – Last Update March 2011