01.11.2014 Views

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

111<br />

One can then use the above expression of the joint distribution over f and f* to compute the<br />

posterior distribution of f* given the training and testing sets X, X * and our prior on f<br />

: N 0, K X,<br />

X which yields:<br />

( ( ))<br />

( ) ( )<br />

−1<br />

f*| X*, X, f : N( K X*, X K X, X f,<br />

−1<br />

( ) − ( ) ( ) ( )<br />

K X*, X* K X*, X K X, X K X, X* )<br />

(5.83)<br />

One can then simply sample from this posterior distribution by evaluating the mean and<br />

covariance matrix from (5.83) and generating samples as done previously for the prior distribution<br />

on f.<br />

Figure 5-17 shows three examples of such sampling. In all three plots the shaded area<br />

represents the pointwise mean plus and plus/minus the standard deviation for each input value<br />

(corresponding to the ~97% confidence region), for the posterior distribution. We used the square<br />

exponential covariance function given by ( )<br />

1<br />

k x, x' = exp ⎛⎛<br />

⎜⎜− x−x'<br />

⎝⎝ 2<br />

2<br />

⎞⎞<br />

⎟⎟<br />

⎠⎠<br />

. We plot in light grey the<br />

area around the regression signal that corresponds to +/- one standard deviation (using the<br />

covariance given by (5.83)). This gives a measure of the uncertainty of the inference of the<br />

model. Areas with large uncertainty are due to lack of training points covering that space. From<br />

left to right, we see the effect of adding one new point (in red) in areas where previously there<br />

were no training points on decreasing locally the variance.<br />

Figure 5-17: The confidence of a Gaussian Process is dependent on the amount of data present in a specific<br />

region of space (left). Regions of low data density have lower confidence. By adding points in those regions,<br />

the confidence increases (center), but the regression function will change to adapt to the new data (right).<br />

[DEMOS\REGRESSION\GPR-CONFIDENCE.ML]<br />

The previous model assumed that the process f was noise free. However, when modeling real data, it is<br />

usual to assume some noise superposition. As in the case of the linear regression mode, we can assume<br />

that the noisy version of our non-linear regression model follows:<br />

( )<br />

( σ)<br />

y= f x + ε<br />

ε ~ N 0,<br />

Where the noise ε follows a zero mean Gaussian distribution.<br />

(5.84)<br />

© A.G.Billard 2004 – Last Update March 2011

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!