01.11.2014 Views

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

114<br />

i=<br />

1<br />

1<br />

( K )<br />

−1 ⎡⎡⎡⎡ 1<br />

% ⎤⎤ 0 ............. 0<br />

⎤⎤⎡⎡y%<br />

⎤⎤<br />

⎢⎢⎢⎢⎣⎣ ⎥⎥⎦⎦ ⎥⎥⎢⎢ ⎥⎥<br />

1 ⎢⎢ .<br />

α ~ ...............................<br />

⎥⎥⎢⎢ ⎥⎥<br />

m ⎢⎢<br />

i<br />

.<br />

K%<br />

⎥⎥⎢⎢ ⎥⎥<br />

−1<br />

∏ ⎢⎢<br />

m<br />

0 .................. ⎡⎡<br />

1<br />

( K%<br />

⎤⎤⎥⎥⎢⎢ ⎥⎥<br />

) m<br />

i=<br />

⎢⎢<br />

⎣⎣<br />

⎢⎢⎣⎣<br />

⎥⎥⎦⎦<br />

⎥⎥⎢⎢ ⎦⎦⎣⎣y<br />

% ⎥⎥⎦⎦<br />

1<br />

−1<br />

l l l<br />

α = ⎡⎡K% ⎤⎤ y , l = 1... m<br />

m ⎣⎣ ⎦⎦<br />

%<br />

i<br />

K%<br />

∏<br />

(5.90)<br />

Where<br />

l<br />

l<br />

y% is composed of the output value associated to the datapoints X .<br />

For each cluster<br />

l<br />

l<br />

C , one can then compute the centroid { l l<br />

µ µ x , µ y }<br />

= of the cluster and a<br />

measure of the dispersion of the datapoints associated to this cluster around the centroid, given<br />

T<br />

{ }<br />

j l l l l<br />

by the covariance matrix<br />

xx E ( X µ x I)( X µ x I)<br />

Σ = − − of the matrix<br />

l<br />

X of datapoints<br />

l<br />

associated to the cluster. Further, for each set of datapoints X , one can use the associated set<br />

of output value<br />

T<br />

( )( ) .<br />

y% l<br />

T<br />

l l l l l<br />

nd compute the crosscovariance matrix Σ = ( %)<br />

−µ y −µ<br />

x<br />

yx<br />

y I X I<br />

If we further assume that each of local kernel Matrix is approximatively a measure of the<br />

−1 T<br />

−1<br />

l<br />

covariance, i.e. ~<br />

l l l l<br />

⎡⎡K % ⎤⎤ ⎡⎡( X −µ xI) ( X −µ<br />

xI)<br />

⎤⎤ and ( ) T<br />

l ( l l<br />

µ ) x<br />

⎣⎣ ⎦⎦ ⎢⎢⎣⎣<br />

⎥⎥⎦⎦<br />

Replacing in Equation (5.88) yields:<br />

k x*, X ~ X I x*<br />

T<br />

− .<br />

m<br />

1<br />

−1<br />

l l l<br />

f* =<br />

( *, )<br />

m ∑ ⎡⎡K% y k x X<br />

i<br />

⎣⎣<br />

⎤⎤ ⎦⎦<br />

%<br />

l=<br />

1<br />

K%<br />

∑<br />

i=<br />

1<br />

(5.91)<br />

Observe that our prediction is now a non-linear combination of m linear projections of the<br />

−1<br />

l l<br />

datapoints through % %. If the number of cluster m is composed of a single datapoint, we<br />

⎡⎡<br />

⎣⎣<br />

K ⎤⎤<br />

⎦⎦<br />

y<br />

obtain a degenerate Gaussian Mixture Model with a unitary covariance matrix for each Gaussian.<br />

Similarly when the clusters are disjoints, the prediction f * can be expressed as the product of<br />

the conditional on each cluster separately and the equivalence with GMM is immediate. In the<br />

more general case, when the clusters contain an arbitrary number of datapoints and are not<br />

disjoints, the full Gaussian Process takes into account influence from all datapoints. In a GMM the<br />

interactions across all datapoints are conveyed in part through the weighting of the effect of each<br />

Gaussian. Note also that the centers of the Gaussians are chosen so as to best balance the<br />

effect of the different datapoints through E-M.<br />

© A.G.Billard 2004 – Last Update March 2011

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!