24.07.2016 Views

www.allitebooks.com

Learning%20Data%20Mining%20with%20Python

Learning%20Data%20Mining%20with%20Python

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 3<br />

This results in an immediate benefit of 60.6 percent, up by 0.6 points by just<br />

swapping the classifier.<br />

Random forests, using subsets of the features, should be able to learn more<br />

effectively with more features than normal decision trees. We can test this by<br />

throwing more features at the algorithm and seeing how it goes:<br />

X_all = np.hstack([X_home_higher, X_teams])<br />

clf = RandomForestClassifier(random_state=14)<br />

scores = cross_val_score(clf, X_all, y_true, scoring='accuracy')<br />

print("Accuracy: {0:.1f}%".format(np.mean(scores) * 100))<br />

This results in 61.1 percent —even better! We can also try some other parameters<br />

using the GridSearchCV class as we introduced in Chapter 2, Classifying with<br />

scikit-learn Estimators:<br />

parameter_space = {<br />

"max_features": [2, 10, 'auto'],<br />

"n_estimators": [100,],<br />

"criterion": ["gini", "entropy"],<br />

"min_samples_leaf": [2, 4, 6],<br />

}<br />

clf = RandomForestClassifier(random_state=14)<br />

grid = GridSearchCV(clf, parameter_space)<br />

grid.fit(X_all, y_true)<br />

print("Accuracy: {0:.1f}%".format(grid.best_score_ * 100))<br />

This has a much better accuracy of 64.2 percent!<br />

If we wanted to see the parameters used, we can print out the best model that was<br />

found in the grid search. The code is as follows:<br />

print(grid.best_estimator_)<br />

The result shows the parameters that were used in the best scoring model:<br />

RandomForestClassifier(bootstrap=True, <strong>com</strong>pute_importances=None,<br />

criterion='entropy', max_depth=None, max_features=2,<br />

max_leaf_nodes=None, min_density=None, min_samples_leaf=6,<br />

min_samples_split=2, n_estimators=100, n_jobs=1,<br />

oob_score=False, random_state=14, verbose=0)<br />

[ 57 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!