01.11.2014 Views

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

167<br />

7.2.4 Decoding an HMM<br />

There are two ways to use a HMM once built. It can be used either for classification or for<br />

prediction.<br />

Classification: assume that you have trained two different HMM-s on the basis of two classes of<br />

time series (say you trained a HMM to recognize a waving gesture and another HMM to<br />

recognize a pointing gesture on the basis of ten examples each). One can use these two HMM-s<br />

classify the observation of a new times series (a new movement) to determine whether it<br />

belonged more to the first or second type of time series (i.e. whether the person was waving or<br />

rather pointing). This can be done by computing the likelihood of the new times series given the<br />

model and determining simply which model is most likely to explain this time series. If the new<br />

times series is X’, the set of parameters of each model are λ 1 = ( A 1 , B<br />

1 , π 1 ), λ 2 = ( A 2 , B<br />

2 , π<br />

2 )<br />

respectively, then the likelihoods of X’ to have been generated by each of these two models<br />

1 2<br />

are p( X'| ), p( X'|<br />

)<br />

backward procedure.<br />

λ λ . These likelihoods can be computed using either the forward or<br />

One can then decide which of model 1 and 2 is most likely to have explained the time series by<br />

directly comparing the two likelihoods. That is, model 1 is more likely to have generated X’ if:<br />

( '| λ<br />

1 ) p( X'|<br />

λ<br />

2<br />

)<br />

p X<br />

> (7.11)<br />

Such a comparison is however difficult as the value of the likelihood is unbounded (recall that pdf<br />

are positive functions with no upper bound). It can take very large values just as an effect of one<br />

model having much more parameters than the other. Hence, in practice, one sometimes<br />

normalizes the likelihood by the number of parameters of each model. Since one only compares<br />

two values, one has no idea how well any of the two models actually explain the data. It could be<br />

that both likelihoods are very low and that hence the observed time series belongs to neither of<br />

these two models. In speech processing, people avoid this problem by creating a “garbage”<br />

model that ensures that, if a times series is not well represented by any of the meaningful model,<br />

then it will be automatically classified in the garbage model. Building a garbage model requires<br />

training the model on a large variety of things that cannot be the pattern one is looking for (e.g.<br />

noise from cars, doors shutting down, etc). Generating such a large set of non-class samples is<br />

not very practical. An alternative is to retain a value of the average likelihood obtained with the<br />

training examples for each model and use this to ensure that the likelihood of the new time series<br />

is at least as big as that seen during training (or within some bounds).<br />

Prediction: Once built a HMM can be used to make prediction about the evolution of a given time<br />

series if provided with part of the time series. Clearly the farther back in time one can go, the<br />

better the prediction.<br />

Say that you have built a HMM to represent the temporal evolution of the weather day after day<br />

taking into account the seasons and the particular geographical area, one can then use the model<br />

to predict the weather for the following couple of days. The farther in the future the prediction, the<br />

less reliable it may be.<br />

To perform prediction one uses the Viterbi algorithm. The viterbi algorithm aims at generating the<br />

most likely path. Given a HMM model with parameters λ = ( AB , , π)<br />

, one looks for the most<br />

likely state sequence { }<br />

Q= q , , 1<br />

… q T<br />

for T time steps in the future, i.e. one optimizes for:<br />

Q arg max P( Q' | O, λ)<br />

= (7.12)<br />

Q'<br />

© A.G.Billard 2004 – Last Update March 2011

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!