01.11.2014 Views

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

110<br />

defined by its mean m( x)<br />

and covariance function ( , ')<br />

x’ that span the data space):<br />

( ) =Ε ( )<br />

m x f x<br />

k x x (k is defined for each pair of point x,<br />

{ }<br />

( )( ( ) ( ))<br />

{ }<br />

( , ') ( ) ( )<br />

k x x =Ε f x −m x f x −m x<br />

(5.79)<br />

For simplicity, most GP assume a zero-mean process, i.e. m( x ) = 0. While the above<br />

description may be multidimensional, most regression techniques on GP assume that the output<br />

f ( x)<br />

is unidimensional. This is a limitation in the application of the process for regression as it<br />

allows making inference solely on a single dimension, say y= f ( x),<br />

y∈° . For multidimensional<br />

inference, one may run one GP per output variable.<br />

Using (5.79) and (5.78) and assuming zero-mean distribution, the baysian regression model can<br />

be rewritten as a Gaussian process defined by:<br />

T<br />

{ ( )} φ ( ) { }<br />

E f x = x E w = 0,<br />

T T<br />

T<br />

{ } φ( ) { } φ( ) φ( ) wφ( )<br />

( ) ( ) ( )<br />

k x, x' = E f x f x' = x E ww x' = x Σ x'<br />

(5.80)<br />

We are now endowed with a probabilistic representation of our real process f. The value taken by<br />

f at any given pair of input x, x’ is jointly Gaussian with zero mean and covariance given<br />

by k( x, x ')<br />

. This means that the estimate of f can be drawn only from looking conjointly at the<br />

distribution of f across two or more input variables. In practice, to visualize the process, one may<br />

sample a set *<br />

estimates of f, such that:<br />

Where ( *, *)<br />

* M *<br />

i<br />

X of M* of data points X* { x }<br />

K X X is a M * M *<br />

i j<br />

using ( )<br />

ij ( )<br />

= and compute f* a M*-dimensional vector of<br />

i=<br />

1<br />

( ( ))<br />

f* N 0, K X*, X*<br />

: (5.81)<br />

× covariance matrx, whose elements are computed<br />

K X*, X * = k x , x , ∀ i, j = 1.... M *. Note that if M*<br />

> D, i.e. the number of<br />

datapoints exceed the dimension of the feature space, the matrix is singular as the rank of K is D.<br />

Generating such a vector is called drawing from the prior distribution of f as it uses solely<br />

information on the query datapoints themselves and the prior assumption that the underlying<br />

process if jointly Gaussian and zero mean, as given by (5.81). A better inference can be made if<br />

one can make use of prior information in terms of a set of training points. Consider the set<br />

X<br />

M<br />

i<br />

{ x }<br />

i=<br />

1<br />

= as the training datapoints, one can then express the joint distribution of the<br />

estimates f and f* associated with the training and testing points respectively as:<br />

( , ) ( *, )<br />

( ) ( )<br />

⎡⎡f<br />

⎤⎤ ⎛⎛ ⎡⎡K X X K X X ⎤⎤⎞⎞<br />

⎢⎢ N 0,<br />

f *<br />

⎥⎥ ⎜⎜ ⎢⎢<br />

⎥⎥⎟⎟<br />

⎣⎣ ⎦⎦ ⎜⎜ ⎢⎢K X, X * K X*, X * ⎥⎥⎟⎟<br />

⎝⎝ ⎣⎣<br />

⎦⎦⎠⎠<br />

: (5.82)<br />

© A.G.Billard 2004 – Last Update March 2011

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!