MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
167<br />
7.2.4 Decoding an HMM<br />
There are two ways to use a HMM once built. It can be used either for classification or for<br />
prediction.<br />
Classification: assume that you have trained two different HMM-s on the basis of two classes of<br />
time series (say you trained a HMM to recognize a waving gesture and another HMM to<br />
recognize a pointing gesture on the basis of ten examples each). One can use these two HMM-s<br />
classify the observation of a new times series (a new movement) to determine whether it<br />
belonged more to the first or second type of time series (i.e. whether the person was waving or<br />
rather pointing). This can be done by computing the likelihood of the new times series given the<br />
model and determining simply which model is most likely to explain this time series. If the new<br />
times series is X’, the set of parameters of each model are λ 1 = ( A 1 , B<br />
1 , π 1 ), λ 2 = ( A 2 , B<br />
2 , π<br />
2 )<br />
respectively, then the likelihoods of X’ to have been generated by each of these two models<br />
1 2<br />
are p( X'| ), p( X'|<br />
)<br />
backward procedure.<br />
λ λ . These likelihoods can be computed using either the forward or<br />
One can then decide which of model 1 and 2 is most likely to have explained the time series by<br />
directly comparing the two likelihoods. That is, model 1 is more likely to have generated X’ if:<br />
( '| λ<br />
1 ) p( X'|<br />
λ<br />
2<br />
)<br />
p X<br />
> (7.11)<br />
Such a comparison is however difficult as the value of the likelihood is unbounded (recall that pdf<br />
are positive functions with no upper bound). It can take very large values just as an effect of one<br />
model having much more parameters than the other. Hence, in practice, one sometimes<br />
normalizes the likelihood by the number of parameters of each model. Since one only compares<br />
two values, one has no idea how well any of the two models actually explain the data. It could be<br />
that both likelihoods are very low and that hence the observed time series belongs to neither of<br />
these two models. In speech processing, people avoid this problem by creating a “garbage”<br />
model that ensures that, if a times series is not well represented by any of the meaningful model,<br />
then it will be automatically classified in the garbage model. Building a garbage model requires<br />
training the model on a large variety of things that cannot be the pattern one is looking for (e.g.<br />
noise from cars, doors shutting down, etc). Generating such a large set of non-class samples is<br />
not very practical. An alternative is to retain a value of the average likelihood obtained with the<br />
training examples for each model and use this to ensure that the likelihood of the new time series<br />
is at least as big as that seen during training (or within some bounds).<br />
Prediction: Once built a HMM can be used to make prediction about the evolution of a given time<br />
series if provided with part of the time series. Clearly the farther back in time one can go, the<br />
better the prediction.<br />
Say that you have built a HMM to represent the temporal evolution of the weather day after day<br />
taking into account the seasons and the particular geographical area, one can then use the model<br />
to predict the weather for the following couple of days. The farther in the future the prediction, the<br />
less reliable it may be.<br />
To perform prediction one uses the Viterbi algorithm. The viterbi algorithm aims at generating the<br />
most likely path. Given a HMM model with parameters λ = ( AB , , π)<br />
, one looks for the most<br />
likely state sequence { }<br />
Q= q , , 1<br />
… q T<br />
for T time steps in the future, i.e. one optimizes for:<br />
Q arg max P( Q' | O, λ)<br />
= (7.12)<br />
Q'<br />
© A.G.Billard 2004 – Last Update March 2011