01.11.2014 Views

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

66<br />

4.3 Probabilistic Regression<br />

Probabilistic regression is a statistical approach to classical linear regression and assumes that<br />

the observed instances of x and y have been generated by an underlying probabilistic process of<br />

which it will try to build an estimate. The rationale goes as follows:<br />

If one knows the joint distribution ( , )<br />

p x y of x and y , then an estimate y ) of the output y can be<br />

built from knowing the input x by computing the expectation of y given x ,<br />

{ ( )}<br />

yˆ E p y|<br />

x<br />

= (4.6)<br />

Note that in many cases, if one is solely interested in constructing a regressive model, one does<br />

not need to build a model of the joint density but needs solely to estimate the<br />

conditional ( | )<br />

p y x .<br />

Probabilistic regression extends the concept of linear regression by assuming that the observed<br />

values of y differ from f ( x ) by an additive random noiseε (noise is usually assumed to be<br />

independent of the observable x ):<br />

( , )<br />

y= f x w + ε<br />

(4.7)<br />

Usually, to simplify computation, one further assumes that the noise follows a zero-mean<br />

2<br />

Gaussian distribution with uncorrelated isotropic varianceσ . The covariance matrix is diagonal<br />

with all elements equal to 2<br />

2<br />

ε = N 0, σ . Such an assumption is called<br />

putting a prior distribution over the noise.<br />

σ ; so we write simply ( )<br />

Let us first consider the probabilistic solution to the linear regression problem described before,<br />

that is:<br />

2<br />

( 0, )<br />

T<br />

y xw N σ<br />

= + (4.8)<br />

We have now one more variable to estimate, namely the variance of the noiseσ .<br />

Assuming that all pairs of observables are i.i.d (identically independently distributed), we can<br />

construct an estimate of the conditional probability of y given x and a choice of parameters<br />

w,<br />

σ as:<br />

M<br />

i<br />

( | , , σ) p( y| x , w,<br />

σ)<br />

p y x w<br />

= ∏ (4.9)<br />

i=<br />

1<br />

Solving for the fact that sole the noise model is probabilistic and that it follows a Gaussian<br />

distribution with zero mean, we obtain:<br />

© A.G.Billard 2004 – Last Update March 2011

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!