24.07.2016 Views

www.allitebooks.com

Learning%20Data%20Mining%20with%20Python

Learning%20Data%20Mining%20with%20Python

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Discovering Accounts to Follow Using Graph Mining<br />

The aim of this analysis was to re<strong>com</strong>mend users, and our use of cluster<br />

analysis allowed us to find clusters of similar users. To do this, we found<br />

connected <strong>com</strong>ponents on a weighted graph we created based on this similarity<br />

metric. We used the NetworkX package for creating graphs, using our graphs,<br />

and finding these connected <strong>com</strong>ponents.<br />

We then used the Silhouette Coefficient, which is a metric that evaluates how good<br />

a clustering solution is. Higher scores indicate a better clustering, according to the<br />

concepts of intra-cluster and inter-cluster distance. SciPy's optimize module was<br />

used to find the solution that maximises this value.<br />

In this chapter, we <strong>com</strong>pared a few opposites too. Similarity is a measure<br />

between two objects, where higher values indicate more similarity between those<br />

objects. In contrast, distance is a measure where lower values indicate more<br />

similarity. Another contrast we saw was a loss function, where lower scores are<br />

considered better (that is, we lost less). Its opposite is the score function, where<br />

higher scores are considered better.<br />

In the next chapter, we will see how to extract features from another new type of<br />

data: images. We will discuss how to use neural networks to identify numbers in<br />

images and develop a program to automatically beat CAPTCHA images.<br />

[ 160 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!