MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
MACHINE LEARNING TECHNIQUES - LASA
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
88<br />
5.6 Kernel K-Means<br />
Kernel K-Means is one attempt at using the kernel trick to improve the properties of one of the<br />
simplest clustering techiques to date, the so-called K-means clustering technique, see Section<br />
3.1.2.<br />
i<br />
K-means builds partition the data into a finite set of K clusters C , i = 1... K,<br />
(here, do not confuse<br />
the scalar K with the Gram matrix seen previously). K-means relies on a measure of distance<br />
across datapoint, usually the Euclidean distance. It proceeds iteratively by updating, at each<br />
iteration, the centers<br />
datapoints { i} i 1<br />
function:<br />
µ 1<br />
,...., µ of the clusters until no update is required. Given a set of M<br />
K<br />
M<br />
X x =<br />
= , the K-means processus consists in minimizing the following objective<br />
J<br />
( )<br />
1<br />
K<br />
2<br />
j<br />
j i<br />
x C<br />
K<br />
= x −<br />
i i<br />
=<br />
i= 1<br />
j i<br />
x ∈C mi<br />
∑∑ (5.25)<br />
∈<br />
µ ,...., µ µ with µ<br />
Where m is the number of datapoints in cluster C<br />
i .<br />
i<br />
Since each cluster relies on a common distance measure, each cluster is separated from the<br />
other by a linear hyperplane, as illustrated below:<br />
∑<br />
x<br />
j<br />
∗µ1<br />
∗µ2<br />
*µ3<br />
To counter this disadvantage, kernel k-means first maps the datapoints onto a higher-dimensional<br />
feature space through a non-linear mapφ . It then proceeds as classical K-means and search for<br />
hyperplanes in the feature space. To do this, kernel K-means exploits once more the kernel trick<br />
k x, x' = φ x φ x'<br />
as the dot product in feature space. Using the<br />
and sets the kernel ( ) ( ) ( )<br />
observation that kernel k-means objective function can be expanded into a sum of inner product<br />
across datapoints, yields:<br />
© A.G.Billard 2004 – Last Update March 2011