24.07.2016 Views

www.allitebooks.com

Learning%20Data%20Mining%20with%20Python

Learning%20Data%20Mining%20with%20Python

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Clustering News Articles<br />

The algorithm then iterates between the assignment step and the updating step; each<br />

time the updating step occurs, each of the centroids moves a small amount. This causes<br />

the assignments to change slightly, causing the centroids to move a small amount in<br />

the next iteration. This repeats until some stopping criterion is reached. It is <strong>com</strong>mon to<br />

stop after a certain number of iterations, or when the total movement of the centroids<br />

is very low. The algorithm can also <strong>com</strong>plete in some scenarios, which means that the<br />

clusters are stable—the assignments do not change and neither do the centroids.<br />

In the following figure, k-means was performed over a dataset created randomly,<br />

but with three clusters in the data. The stars represent the starting location of the<br />

centroids, which were chosen randomly by picking a random sample from the<br />

dataset. Over 5 iterations of the k-means algorithm, the centroids move to the<br />

locations represented by the triangles.<br />

[ 224 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!