MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
116<br />
5.10 Gaussian Process Classification<br />
Generative versus Discriminative Approaches<br />
To recall, there are two approaches to classification. One can either follow a so-called generative<br />
p X,<br />
Y that associates the set of<br />
approach whereby one first learns the joint distribution ( )<br />
datapoints X = { x 1 ,.... x<br />
M<br />
} to a set of labels { 1 ,..., M<br />
Y y y }<br />
= , where each class label denotes the<br />
associated class numbering. Using the joint distribution, one can then compute the conditional<br />
pY|<br />
X to predict the class label for each datapoint. Alternatively, one can follow a<br />
distribution ( )<br />
discriminative approach in which one estimates directly the conditional distribution ( | )<br />
pY X . In<br />
the previous section, we showed how to perform regression using a Gaussian Process, it seems<br />
hence intuitive to extend this to classification and to this end to take a discriminative approach.<br />
Given that there are the generative and discriminative approaches, which one<br />
should we prefer? This is perhaps the biggest question in classification, and we do<br />
not believe that there is a right answer, as both ways of writing the joint p(y, x) are<br />
correct. However, it is possible to identify some strengths and weaknesses of the<br />
two approaches. The discriminative approach is appealing in that it is directly<br />
modelling what we want, p(y|x). Also, density estimation for the class-conditional<br />
distributions is a hard problem, particularly when x is high dimensional, so if we are<br />
just interested in classification then the generative approach may mean that we are<br />
trying to solve a harder problem than we need to. However, to deal with missing<br />
input values, outliers and unlabelled data missing values points in a principled<br />
fashion it is very helpful to have access to p(x), and this can be obtained from<br />
p x = ∑ p y p x|<br />
y in the<br />
marginalizing out the class label y from the joint as ( ) ( ) ( )<br />
generative approach. A further factor in the choice of a generative or discriminative<br />
approach could also be which one is most conducive to the incorporation of any<br />
prior information which is available. (Quote from C. E. Rasmussen & C. K. I.<br />
Williams, Gaussian Processes for Machine Learning, the MIT Press, 2006.)<br />
y<br />
Linear Case<br />
As for all other kernel methods send in this class, we will first start with the linear case and then<br />
extend it to the non-linear case by exploiting the kernel trick.<br />
One problem when performing multi-class classification is to compare the predictions of each<br />
class. It would be easier if the output of the density could have a direct probabilistic interpretation.<br />
In the simply binary classification problem where the label y takes either value +1 or value -1, i.e.<br />
i i<br />
∈− [ 1; + 1 ], = 1... , a probabilistic readout of the the conditional ( | )<br />
i<br />
y i M<br />
easily by using, for instance, the logistic regressive model and compute:<br />
p y x can be obtained<br />
i<br />
i<br />
( 1| x )<br />
p y<br />
=+ =<br />
1+<br />
1<br />
i<br />
( x )<br />
By extension, the probability of the label -1 is p( y i 1| x i ) 1 p( y i 1| x<br />
i<br />
)<br />
e −<br />
T<br />
w<br />
=− = − =+ .<br />
(5.92)<br />
© A.G.Billard 2004 – Last Update March 2011