01.11.2014 Views

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

116<br />

5.10 Gaussian Process Classification<br />

Generative versus Discriminative Approaches<br />

To recall, there are two approaches to classification. One can either follow a so-called generative<br />

p X,<br />

Y that associates the set of<br />

approach whereby one first learns the joint distribution ( )<br />

datapoints X = { x 1 ,.... x<br />

M<br />

} to a set of labels { 1 ,..., M<br />

Y y y }<br />

= , where each class label denotes the<br />

associated class numbering. Using the joint distribution, one can then compute the conditional<br />

pY|<br />

X to predict the class label for each datapoint. Alternatively, one can follow a<br />

distribution ( )<br />

discriminative approach in which one estimates directly the conditional distribution ( | )<br />

pY X . In<br />

the previous section, we showed how to perform regression using a Gaussian Process, it seems<br />

hence intuitive to extend this to classification and to this end to take a discriminative approach.<br />

Given that there are the generative and discriminative approaches, which one<br />

should we prefer? This is perhaps the biggest question in classification, and we do<br />

not believe that there is a right answer, as both ways of writing the joint p(y, x) are<br />

correct. However, it is possible to identify some strengths and weaknesses of the<br />

two approaches. The discriminative approach is appealing in that it is directly<br />

modelling what we want, p(y|x). Also, density estimation for the class-conditional<br />

distributions is a hard problem, particularly when x is high dimensional, so if we are<br />

just interested in classification then the generative approach may mean that we are<br />

trying to solve a harder problem than we need to. However, to deal with missing<br />

input values, outliers and unlabelled data missing values points in a principled<br />

fashion it is very helpful to have access to p(x), and this can be obtained from<br />

p x = ∑ p y p x|<br />

y in the<br />

marginalizing out the class label y from the joint as ( ) ( ) ( )<br />

generative approach. A further factor in the choice of a generative or discriminative<br />

approach could also be which one is most conducive to the incorporation of any<br />

prior information which is available. (Quote from C. E. Rasmussen & C. K. I.<br />

Williams, Gaussian Processes for Machine Learning, the MIT Press, 2006.)<br />

y<br />

Linear Case<br />

As for all other kernel methods send in this class, we will first start with the linear case and then<br />

extend it to the non-linear case by exploiting the kernel trick.<br />

One problem when performing multi-class classification is to compare the predictions of each<br />

class. It would be easier if the output of the density could have a direct probabilistic interpretation.<br />

In the simply binary classification problem where the label y takes either value +1 or value -1, i.e.<br />

i i<br />

∈− [ 1; + 1 ], = 1... , a probabilistic readout of the the conditional ( | )<br />

i<br />

y i M<br />

easily by using, for instance, the logistic regressive model and compute:<br />

p y x can be obtained<br />

i<br />

i<br />

( 1| x )<br />

p y<br />

=+ =<br />

1+<br />

1<br />

i<br />

( x )<br />

By extension, the probability of the label -1 is p( y i 1| x i ) 1 p( y i 1| x<br />

i<br />

)<br />

e −<br />

T<br />

w<br />

=− = − =+ .<br />

(5.92)<br />

© A.G.Billard 2004 – Last Update March 2011

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!