24.07.2016 Views

www.allitebooks.com

Learning%20Data%20Mining%20with%20Python

Learning%20Data%20Mining%20with%20Python

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Classifying with scikit-learn Estimators<br />

In the figure (a), on the left-hand side, we would usually expect the test sample<br />

(the triangle) to be classified as a circle. However, if n_neighbors is 1, the single<br />

red diamond in this area (likely a noisy sample) causes the sample to be predicted<br />

as being a diamond, while it appears to be in a red area. In the figure (b), on the<br />

right-hand side, we would usually expect the test sample to be classified as a<br />

diamond. However, if n_neighbors is 7, the three nearest neighbors (which<br />

are all diamonds) are overridden by the large number of circle samples.<br />

If we want to test a number of values for the n_neighbors parameter, for example,<br />

each of the values from 1 to 20, we can rerun the experiment many times by setting<br />

n_neighbors and observing the result:<br />

avg_scores = []<br />

all_scores = []<br />

parameter_values = list(range(1, 21)) # Include 20<br />

for n_neighbors in parameter_values:<br />

estimator = KNeighborsClassifier(n_neighbors=n_neighbors)<br />

scores = cross_val_score(estimator, X, y, scoring='accuracy')<br />

Compute and store the average in our list of scores. We also store the full set of<br />

scores for later analysis:<br />

avg_scores.append(np.mean(scores))<br />

all_scores.append(scores)<br />

We can then plot the relationship between the value of n_neighbors and the<br />

accuracy. First, we tell the IPython Notebook that we want to show plots inline<br />

in the notebook itself:<br />

%matplotlib inline<br />

We then import pyplot from the matplotlib library and plot the parameter values<br />

alongside average scores:<br />

from matplotlib import pyplot as plt plt.plot(parameter_values,<br />

avg_scores, '-o')<br />

[ 34 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!