MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
62<br />
quantities are comparable. Since a pdf is a positive, but unbounded (above), function, its<br />
likelihood may grow arbitrarily. It usually grows with the numer of parameters. A good precaution<br />
is then to make sure that the two classifiers have the same number of parameters and are trained<br />
on the same number of datapoints. The latter means that one must have at its disposal as many<br />
examples of the positive class as that of the negative class. If this can not be ensured, then one<br />
may apply some normalization term on the likelihood of each classifier or one may use a<br />
threshold on the likelihood of both models to determine when inference is warranted. Despite<br />
these caveats, Bayes classifiers remain quite popular, in part because of their extreme simplicity,<br />
but also because they yield very good performance in practice.<br />
Note that Bayes classification can easily be extended to multiclass classification. Assume that<br />
i<br />
you wish to classify a set of datapoints { }<br />
i=<br />
1<br />
M<br />
X = x into K classes<br />
i<br />
i<br />
y associated to each datapoint x , i = 1.... M,<br />
can then take any value<br />
compute the posterior probability of each class, using:<br />
1 K<br />
C ,...., C . Each label<br />
1 K<br />
C ,...., C . One can then<br />
i<br />
( | )<br />
p y = C x =<br />
i<br />
i<br />
p( y = C ) p( x|<br />
y = C )<br />
K<br />
k<br />
k<br />
∑ p( y = C ) p( x|<br />
y = C )<br />
k = 1<br />
(3.33)<br />
We see next how this can be used when using Gaussian Mixture Models. Note that we will revisit<br />
multi-class classification using kernel methods in Section 5.10.<br />
3.4 Linear classification with Gaussian Mixture Models<br />
The Gaussian Mixture Model, introduced in Section 3.1.5, can be used in conjunction with the<br />
Bayes classifier to perform binary classification. For two knows classes of datapoints, one can<br />
train two separate GMM-s, one for each class of datapoints and use the Bayes classifier to<br />
determine the labeling of any given datapoint. In other word, each GMM yields a density<br />
p and p which can then be used to determine the label of a new training point using (3.32).<br />
+ −<br />
To ensure that the two models are comparable, one should train two GMM with same number of<br />
Gaussians, using the same modeling (with full, diagonal or isotropic covariance matrix in each<br />
case), and using similar number of training points for each GMM. Figure 3-19 shows one example<br />
of such classification using two GMM-s with 8 Gaussians each.<br />
© A.G.Billard 2004 – Last Update March 2011