MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA MACHINE LEARNING TECHNIQUES - LASA

01.11.2014 Views

108 Figure 5-16: Increase of the number of support vectors and of the proportion of datapoints outside the e-sensitive tube when increasing n. From top to bottom, n takes value 0.08, 0.1 and 0.9. n-SVR was fitted with C=50 and a Gaussian kernel with kernel width=0.021. © A.G.Billard 2004 – Last Update March 2011

109 5.9 Gaussian Process Regression Adapted from C. E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Learning, the MIT Press, 2006. In Section 4.3, we introduced probabilistic regression, a method by which the standard linear T 2 regressive model y w x N( 0, σ ) = + was extended to building a probabilistic estimate of the conditional distribution p( y| x ). Then for a new query point x*, one could compute an estimate y* by taking the expectation of y given x , yˆ = E p( y| x) . Further, using the assumption that { } all training points are i.i.d. and using a Gaussian prior with variance ∑ w on the distribution of the parameters w of the model, we found that the predictive distribution is also Gaussian and is given by: * * ⎛⎛ 1 * T * T * ⎞⎞ p( y | x , X, y) = N⎜⎜ x ∑ , 2 w Xy x ∑w x ⎟⎟ ⎝⎝σ ⎠⎠ (5.77) where X is N× M and y is 1× N are the matrix and vector of input-output training points. Next, we see how this probabilistic linear regressive model can be extended to allow non-linear regression, exploiting once more the kernel trick. Non-linear Case Assuming a non-linear transformation into feature space through the function φ ( x) , that maps each N-dimensional datapoint x into a D-dimensional feature space, and substituting everywhere in the linear model, the predictive distribution for the non-linear model becomes: 1 p y x X y N⎜⎜ x A X y x A x 2 ⎝⎝σ * * ⎛⎛ * T −1 * T −1 * ( | , , ) = φ( ) Φ( ) , φ( ) φ( ) ( ) ( ) −2 T −1 w with A= σ Φ X Φ X +Σ * ⎞⎞ ⎟⎟ ⎠⎠ (5.78) Φ( X ) is a matrix whose columns are composed of each projection φ ( x) of each training point x∈ X . While the expression of this density is quite simple, in practice computing the inverse of the matrix A may be very difficult as its dimension is proportional to the dimension of the feature space that may be quite large. 5.9.1 What is a Gaussian Process The baysian regression model given by (5.78) is one example of Gaussian Process. In its generic definition, a “Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution”. Assume that the real process you wish to describe is regulated by the function ( ) f x , where x spans the data space. Then, a Gaussian Process (GP) estimate of the function f is entirely © A.G.Billard 2004 – Last Update March 2011

108<br />

Figure 5-16: Increase of the number of support vectors and of the proportion of datapoints outside the<br />

e-sensitive tube when increasing n. From top to bottom, n takes value 0.08, 0.1 and 0.9. n-SVR was fitted<br />

with C=50 and a Gaussian kernel with kernel width=0.021.<br />

© A.G.Billard 2004 – Last Update March 2011

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!