MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
117<br />
The weight w determines the slope of the sigmoid function. This is an open parameter which<br />
should be chosen so as to ensure optimal classification, or conversely minimal penalty due<br />
misclassification. We now have all the tools to proceed to Gaussian Process classification.<br />
As in Gaussian Process Regression, we can start by putting a Gaussian prior on the distribution<br />
of the parameter w , i.e. w~ N( 0, Σ<br />
w ) and compute the log of the posterior distribution on the<br />
weight given our dataset X = { x 1 ,.... x<br />
M<br />
} with associated labels { 1 ,..., M<br />
Y y y }<br />
= :<br />
M<br />
1 T<br />
1<br />
log p( w| X, Y)<br />
=− w Σ<br />
ww+<br />
log<br />
i<br />
2<br />
i=<br />
1<br />
( x )<br />
1+<br />
e −<br />
∑ T<br />
(5.93)<br />
Unfortunately the posterior does not have an analytical solution, a contrario to the regression<br />
case. One can however observe that the posterior is concave and hence its maximum can be<br />
found using classical optimization method, such as Newton’s methods or conjugate gradient<br />
descent. Figure Figure 5-19 illustrates nicely how the original weight distribution is tilted as an<br />
effect of fitting the data.<br />
w<br />
Figure 5-19: a) Contours of the prior distribution p(w) = N(0, I). (b) dataset, with circles indicating class +1<br />
and crosses denoting class −1. (c) contours of the posterior distribution p(w|D). (d) contours of the predictive<br />
distribution p(y_=+1|x_), adapted from C. E. Rasmussen & C. K. I. Williams, Gaussian Processes for<br />
Machine Learning, the MIT Press, 2006.<br />
© A.G.Billard 2004 – Last Update March 2011