24.07.2016 Views

www.allitebooks.com

Learning%20Data%20Mining%20with%20Python

Learning%20Data%20Mining%20with%20Python

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 5<br />

Next, we create our transformer using the chi2 function and a SelectKBest<br />

transformer:<br />

from sklearn.feature_selection import SelectKBest<br />

from sklearn.feature_selection import chi2<br />

transformer = SelectKBest(score_func=chi2, k=3)<br />

Running fit_transform will call fit and then transform with the same dataset.<br />

The result will create a new dataset, choosing only the best three features.<br />

Let's look at the code:<br />

Xt_chi2 = transformer.fit_transform(X, y)<br />

The resulting matrix now only contains three features. We can also get the scores<br />

for each column, allowing us to find out which features were used. Let's look at<br />

the code:<br />

print(transformer.scores_)<br />

The printed results give us these scores:<br />

[ 8.60061182e+03 2.40142178e+03 8.21924671e+07 1.37214589e+06<br />

6.47640900e+03]<br />

The highest values are for the first, third, and fourth columns Correlates to the Age,<br />

Capital-Gain, and Capital-Loss features. Based on a univariate feature selection,<br />

these are the best features to choose.<br />

If you'd like to find out more about the features in the Adult dataset,<br />

take a look at the adult.names file that <strong>com</strong>es with the dataset and<br />

the academic paper it references.<br />

We could also implement other correlations, such as the Pearson's correlation<br />

coefficient. This is implemented in SciPy, a library used for scientific <strong>com</strong>puting<br />

(scikit-learn uses it as a base).<br />

If scikit-learn is working on your <strong>com</strong>puter, so is SciPy. You do not<br />

need to install anything further to get this sample working.<br />

First, we import the pearsonr function from SciPy:<br />

from scipy.stats import pearsonr<br />

[ 91 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!