MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
66<br />
4.3 Probabilistic Regression<br />
Probabilistic regression is a statistical approach to classical linear regression and assumes that<br />
the observed instances of x and y have been generated by an underlying probabilistic process of<br />
which it will try to build an estimate. The rationale goes as follows:<br />
If one knows the joint distribution ( , )<br />
p x y of x and y , then an estimate y ) of the output y can be<br />
built from knowing the input x by computing the expectation of y given x ,<br />
{ ( )}<br />
yˆ E p y|<br />
x<br />
= (4.6)<br />
Note that in many cases, if one is solely interested in constructing a regressive model, one does<br />
not need to build a model of the joint density but needs solely to estimate the<br />
conditional ( | )<br />
p y x .<br />
Probabilistic regression extends the concept of linear regression by assuming that the observed<br />
values of y differ from f ( x ) by an additive random noiseε (noise is usually assumed to be<br />
independent of the observable x ):<br />
( , )<br />
y= f x w + ε<br />
(4.7)<br />
Usually, to simplify computation, one further assumes that the noise follows a zero-mean<br />
2<br />
Gaussian distribution with uncorrelated isotropic varianceσ . The covariance matrix is diagonal<br />
with all elements equal to 2<br />
2<br />
ε = N 0, σ . Such an assumption is called<br />
putting a prior distribution over the noise.<br />
σ ; so we write simply ( )<br />
Let us first consider the probabilistic solution to the linear regression problem described before,<br />
that is:<br />
2<br />
( 0, )<br />
T<br />
y xw N σ<br />
= + (4.8)<br />
We have now one more variable to estimate, namely the variance of the noiseσ .<br />
Assuming that all pairs of observables are i.i.d (identically independently distributed), we can<br />
construct an estimate of the conditional probability of y given x and a choice of parameters<br />
w,<br />
σ as:<br />
M<br />
i<br />
( | , , σ) p( y| x , w,<br />
σ)<br />
p y x w<br />
= ∏ (4.9)<br />
i=<br />
1<br />
Solving for the fact that sole the noise model is probabilistic and that it follows a Gaussian<br />
distribution with zero mean, we obtain:<br />
© A.G.Billard 2004 – Last Update March 2011