MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
111<br />
One can then use the above expression of the joint distribution over f and f* to compute the<br />
posterior distribution of f* given the training and testing sets X, X * and our prior on f<br />
: N 0, K X,<br />
X which yields:<br />
( ( ))<br />
( ) ( )<br />
−1<br />
f*| X*, X, f : N( K X*, X K X, X f,<br />
−1<br />
( ) − ( ) ( ) ( )<br />
K X*, X* K X*, X K X, X K X, X* )<br />
(5.83)<br />
One can then simply sample from this posterior distribution by evaluating the mean and<br />
covariance matrix from (5.83) and generating samples as done previously for the prior distribution<br />
on f.<br />
Figure 5-17 shows three examples of such sampling. In all three plots the shaded area<br />
represents the pointwise mean plus and plus/minus the standard deviation for each input value<br />
(corresponding to the ~97% confidence region), for the posterior distribution. We used the square<br />
exponential covariance function given by ( )<br />
1<br />
k x, x' = exp ⎛⎛<br />
⎜⎜− x−x'<br />
⎝⎝ 2<br />
2<br />
⎞⎞<br />
⎟⎟<br />
⎠⎠<br />
. We plot in light grey the<br />
area around the regression signal that corresponds to +/- one standard deviation (using the<br />
covariance given by (5.83)). This gives a measure of the uncertainty of the inference of the<br />
model. Areas with large uncertainty are due to lack of training points covering that space. From<br />
left to right, we see the effect of adding one new point (in red) in areas where previously there<br />
were no training points on decreasing locally the variance.<br />
Figure 5-17: The confidence of a Gaussian Process is dependent on the amount of data present in a specific<br />
region of space (left). Regions of low data density have lower confidence. By adding points in those regions,<br />
the confidence increases (center), but the regression function will change to adapt to the new data (right).<br />
[DEMOS\REGRESSION\GPR-CONFIDENCE.ML]<br />
The previous model assumed that the process f was noise free. However, when modeling real data, it is<br />
usual to assume some noise superposition. As in the case of the linear regression mode, we can assume<br />
that the noisy version of our non-linear regression model follows:<br />
( )<br />
( σ)<br />
y= f x + ε<br />
ε ~ N 0,<br />
Where the noise ε follows a zero mean Gaussian distribution.<br />
(5.84)<br />
© A.G.Billard 2004 – Last Update March 2011