MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
114<br />
i=<br />
1<br />
1<br />
( K )<br />
−1 ⎡⎡⎡⎡ 1<br />
% ⎤⎤ 0 ............. 0<br />
⎤⎤⎡⎡y%<br />
⎤⎤<br />
⎢⎢⎢⎢⎣⎣ ⎥⎥⎦⎦ ⎥⎥⎢⎢ ⎥⎥<br />
1 ⎢⎢ .<br />
α ~ ...............................<br />
⎥⎥⎢⎢ ⎥⎥<br />
m ⎢⎢<br />
i<br />
.<br />
K%<br />
⎥⎥⎢⎢ ⎥⎥<br />
−1<br />
∏ ⎢⎢<br />
m<br />
0 .................. ⎡⎡<br />
1<br />
( K%<br />
⎤⎤⎥⎥⎢⎢ ⎥⎥<br />
) m<br />
i=<br />
⎢⎢<br />
⎣⎣<br />
⎢⎢⎣⎣<br />
⎥⎥⎦⎦<br />
⎥⎥⎢⎢ ⎦⎦⎣⎣y<br />
% ⎥⎥⎦⎦<br />
1<br />
−1<br />
l l l<br />
α = ⎡⎡K% ⎤⎤ y , l = 1... m<br />
m ⎣⎣ ⎦⎦<br />
%<br />
i<br />
K%<br />
∏<br />
(5.90)<br />
Where<br />
l<br />
l<br />
y% is composed of the output value associated to the datapoints X .<br />
For each cluster<br />
l<br />
l<br />
C , one can then compute the centroid { l l<br />
µ µ x , µ y }<br />
= of the cluster and a<br />
measure of the dispersion of the datapoints associated to this cluster around the centroid, given<br />
T<br />
{ }<br />
j l l l l<br />
by the covariance matrix<br />
xx E ( X µ x I)( X µ x I)<br />
Σ = − − of the matrix<br />
l<br />
X of datapoints<br />
l<br />
associated to the cluster. Further, for each set of datapoints X , one can use the associated set<br />
of output value<br />
T<br />
( )( ) .<br />
y% l<br />
T<br />
l l l l l<br />
nd compute the crosscovariance matrix Σ = ( %)<br />
−µ y −µ<br />
x<br />
yx<br />
y I X I<br />
If we further assume that each of local kernel Matrix is approximatively a measure of the<br />
−1 T<br />
−1<br />
l<br />
covariance, i.e. ~<br />
l l l l<br />
⎡⎡K % ⎤⎤ ⎡⎡( X −µ xI) ( X −µ<br />
xI)<br />
⎤⎤ and ( ) T<br />
l ( l l<br />
µ ) x<br />
⎣⎣ ⎦⎦ ⎢⎢⎣⎣<br />
⎥⎥⎦⎦<br />
Replacing in Equation (5.88) yields:<br />
k x*, X ~ X I x*<br />
T<br />
− .<br />
m<br />
1<br />
−1<br />
l l l<br />
f* =<br />
( *, )<br />
m ∑ ⎡⎡K% y k x X<br />
i<br />
⎣⎣<br />
⎤⎤ ⎦⎦<br />
%<br />
l=<br />
1<br />
K%<br />
∑<br />
i=<br />
1<br />
(5.91)<br />
Observe that our prediction is now a non-linear combination of m linear projections of the<br />
−1<br />
l l<br />
datapoints through % %. If the number of cluster m is composed of a single datapoint, we<br />
⎡⎡<br />
⎣⎣<br />
K ⎤⎤<br />
⎦⎦<br />
y<br />
obtain a degenerate Gaussian Mixture Model with a unitary covariance matrix for each Gaussian.<br />
Similarly when the clusters are disjoints, the prediction f * can be expressed as the product of<br />
the conditional on each cluster separately and the equivalence with GMM is immediate. In the<br />
more general case, when the clusters contain an arbitrary number of datapoints and are not<br />
disjoints, the full Gaussian Process takes into account influence from all datapoints. In a GMM the<br />
interactions across all datapoints are conveyed in part through the weighting of the effect of each<br />
Gaussian. Note also that the centers of the Gaussians are chosen so as to best balance the<br />
effect of the different datapoints through E-M.<br />
© A.G.Billard 2004 – Last Update March 2011