01.11.2014 Views

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

112<br />

T<br />

In this case the covariance of the prior becomes: ( )<br />

is a matrix whose columns<br />

i<br />

x through (5.84).<br />

2<br />

( ε) ( ) σ<br />

yy = cov f X + = K X , X + I , where y<br />

i<br />

y , i=1….M, correspond to the projection of the associated training point<br />

As done previously, we can express the joint distribution of the prior y (now including noise in the estimate of<br />

prior on the training datapoints) and the testing points given through f*:<br />

2<br />

( , ) + σ I ( *, )<br />

( ) ( )<br />

⎡⎡y<br />

⎤⎤ ⎛⎛ ⎡⎡K X X K X X ⎤⎤⎞⎞<br />

⎢⎢ N 0,<br />

f *<br />

⎥⎥ ⎜⎜ ⎢⎢<br />

⎥⎥⎟⎟<br />

⎣⎣ ⎦⎦ ⎜⎜ ⎢⎢K X, X * K X*, X * ⎥⎥⎟⎟<br />

⎝⎝ ⎣⎣<br />

⎦⎦⎠⎠<br />

: (5.85)<br />

Again, one can compute the conditional distribution of f* given the pair of training datapoints X,<br />

the testing datapoints X* and the noisy prior y.<br />

( ( ))<br />

f*| X*, X, y : N f*,cov f *<br />

{ } ( ) ⎡⎡ ( )<br />

2<br />

−1<br />

f* = E f*| X*, X, y = K X*, X K X,<br />

X + σ I⎤⎤<br />

y<br />

2<br />

−1<br />

( f ) = K( X X ) − K( X X) ⎡⎡K( X X) + σ I⎤⎤<br />

K( X X )<br />

cov * *, * *, ⎣⎣ , ⎦⎦ , *<br />

⎣⎣<br />

⎦⎦<br />

(5.86)<br />

We are usually interested in computing solely the response of the model to one query point x *.<br />

In this case, the estimate of the associated output y * is given by the following:<br />

T<br />

{ } ( ) ⎡⎡ ( )<br />

2<br />

−1<br />

y*~ f* = E f*| x*, X, y = k x*, X K X,<br />

X + σ I⎤⎤<br />

y<br />

⎣⎣<br />

⎦⎦<br />

(5.87)<br />

i<br />

( ) ( )<br />

k x*, X is the vector of covariance k x*, x between the query point and the<br />

i<br />

M training data points x , i = 1... M.<br />

Since all the training pairs ( )<br />

i i<br />

x , y , i 1... M<br />

= are given, these can be treated as parameters to<br />

the system and hence the prediction on y * from Equation (5.87) can be expressed as a linear<br />

i<br />

combination of kernel functions k( x*, x ):<br />

M<br />

i<br />

{ } ∑αi<br />

( )<br />

y*~ f* = E f*| x*, X, y = k x*,<br />

x<br />

( )<br />

i=<br />

1<br />

2<br />

−1<br />

K X X σ I⎤⎤<br />

y<br />

with α = ⎡⎡<br />

⎣⎣ , +<br />

⎦⎦<br />

(5.88)<br />

We have M kernel functions for each of the M training points<br />

x i .<br />

© A.G.Billard 2004 – Last Update March 2011

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!