09.05.2023 Views

pdfcoffee

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Unsupervised Learning

K-means clustering is very popular because it is fast, simple, and robust. It also has

some disadvantages, however, the biggest being that the user has to specify the

number of clusters. Second, the algorithm does not guarantee global optima; the

results can change if the initial randomly chosen centroids change. Third, it is very

sensitive to outliers.

Variations in k-means

In the original k-means algorithm each point belongs to a specific cluster (centroid);

this is called hard clustering. However, we can have one point belong to all the

clusters, with a membership function defining how much it belongs to a particular

cluster (centroid). This is called fuzzy clustering or soft clustering. This variation was

proposed in 1973 by J. C. Dunn and later improved upon by J. C. Bezdek in 1981.

Though soft clustering takes longer to converge, it can be useful when a point can

be in multiple classes, or when we want to know how similar a given point is to

different clusters.

The accelerated k-means algorithm was created in 2003 by Charles Elkan. He

exploited the triangle inequality relationship (that is, that a straight line is the

shortest distance between two points). Instead of just doing all distance calculations

at each iteration, he also kept track of the lower and upper bounds for distances

between points and centroids.

In 2006, David Arthur and Sergei Vassilvitskii proposed the k-means++ algorithm.

The major change they proposed was in the initialization of centroids. They showed

that if we choose centroids that are distant from each other, then the k-means

algorithm is less likely to converge on a suboptimal solution.

Another alternative can be that at each iteration we do not use the entire dataset,

instead using mini-batches. This modification was proposed by David Sculey in 2010.

Self-organizing maps

Both k-means and PCA can cluster the input data; however, they do not maintain

topological relationship. In this section we will consider Self-organized maps

(SOM), sometimes known as Kohonen networks or Winner take all units (WTU).

They maintain the topological relation. SOMs are a very special kind of neural

network, inspired by a distinctive feature of the human brain. In our brain, different

sensory inputs are represented in a topologically ordered manner. Unlike other

neural networks, neurons are not all connected to each other via weights; instead,

they influence each other's learning. The most important aspect of SOM is that

neurons represent the learned inputs in a topographic manner. They were proposed

by Tuevo Kohonen in 1989 [2].

[ 384 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!