01.11.2014 Views

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

166<br />

Finally, the emission probabilities in the discrete case are given by computing the expected<br />

number of times one will be in each particular state and that the observation took a particular<br />

value v k<br />

, i.e. ot<br />

= v :<br />

k<br />

bˆ v<br />

tot<br />

= k<br />

( )<br />

{: v }<br />

i k T<br />

= ∑ ∑ =<br />

t 1<br />

γ () i<br />

t<br />

γ () i<br />

t<br />

(7.7)<br />

In the continuous case, one builds a continuous estimate of the b ˆi , e.g. by building a GMM for<br />

each state.<br />

The ^ symbol in each of the above designates the new estimate of the values for each of the<br />

HMM parameter.<br />

7.2.3 Determining the number of states<br />

A crucial parameter to HMM (and similarly to GMM) is the number of states K. Increasing K<br />

increases the number of parameters and so computation. Of course, increasing the number of<br />

states allows a finer description of the density and hence increases the likelihood of the model<br />

given the data. This increase may however be negligeable in contrast to the increase in<br />

computation time due to the increase of parameters. One must then find a tradeoff between<br />

increasing the number of parameters and improving the likelihood of the model. This is<br />

encapsulated in the Aikaike Information Criterion (AIC):<br />

where L is the likelihood of the model.<br />

AIC= − 2ln L+ 2K<br />

(7.8)<br />

This criterion was however lately replaced by the Bayesian Information Criterion that also weights<br />

the fact that the estimate of the number of parameters depends on the number of data points.<br />

Sparsity in the data may require a lot of parameters to encapsulate non-linearities in these. As the<br />

number of datapoints M increases, so does the influence of the number of parameters.<br />

BIC =− 2lnL + K ln( M )<br />

(7.9)<br />

An alternative to the above two criteria is the Deviation Information Criterion (DIC)<br />

{ ( )} { }<br />

( ) ( ) ( )<br />

DIC = E D K - D E K with D K 2 ln p X | K<br />

=− (7.10)<br />

Again the smaller the DIC, the better the model fits the data. DIC favors a good fit, while<br />

penalizing models by measuring the effective number of parameters required for a similar level of<br />

fit.<br />

© A.G.Billard 2004 – Last Update March 2011

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!