24.07.2016 Views

www.allitebooks.com

Learning%20Data%20Mining%20with%20Python

Learning%20Data%20Mining%20with%20Python

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Classifying with scikit-learn Estimators<br />

Most scikit-learn estimators use the NumPy arrays or a related format for input<br />

and output.<br />

There are a large number of estimators in scikit-learn. These include support vector<br />

machines (SVM), random forests, and neural networks. Many of these algorithms<br />

will be used in later chapters. In this chapter, we will use a different estimator from<br />

scikit-learn: nearest neighbor.<br />

For this chapter, you will need to install a new library called<br />

matplotlib. The easiest way to install it is to use pip3, as you did in<br />

Chapter 1, Getting Started with Data Mining, to install scikit-learn:<br />

$pip3 install matplotlib<br />

If you have any difficulty installing matplotlib, seek the official<br />

installation instructions at http://matplotlib.org/users/<br />

installing.html.<br />

Nearest neighbors<br />

Nearest neighbors is perhaps one of the most intuitive algorithms in the set of<br />

standard data mining algorithms. To predict the class of a new sample, we look<br />

through the training dataset for the samples that are most similar to our new sample.<br />

We take the most similar sample and predict the class that the majority of those<br />

samples have.<br />

As an example, we wish to predict the class of the triangle, based on which class it is<br />

more similar to (represented here by having similar objects closer together). We seek<br />

the three nearest neighbors, which are two diamonds and one square. There are more<br />

diamonds than circles, and the predicted class for the triangle is, therefore, a diamond:<br />

[ 26 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!