www.allitebooks.com

Learning%20Data%20Mining%20with%20Python Learning%20Data%20Mining%20with%20Python

24.07.2016 Views

Chapter 3 If you are facing trouble extracting features of these types, check the pandas documentation at http://pandas.pydata.org/pandas-docs/stable/ for help. Alternatively, you can try an online forum such as Stack Overflow for assistance. More extreme examples could use player data to estimate the strength of each team's sides to predict who won. These types of complex features are used every day by gamblers and sports betting agencies to try to turn a profit by predicting the outcome of sports matches. Summary In this chapter, we extended our use of scikit-learn's classifiers to perform classification and introduced the pandas library to manage our data. We analyzed real-world data on basketball results from the NBA, saw some of the problems that even well-curated data introduces, and created new features for our analysis. We saw the effect that good features have on performance and used an ensemble algorithm, Random forests, to further improve the accuracy. In the next chapter, we will extend the affinity analysis that we performed in the first chapter to create a program to find similar books. We will see how to use algorithms for ranking and also use approximation to improve the scalability of data mining. [ 59 ]

Chapter 3<br />

If you are facing trouble extracting features of these types, check the pandas<br />

documentation at http://pandas.pydata.org/pandas-docs/stable/ for help.<br />

Alternatively, you can try an online forum such as Stack Overflow for assistance.<br />

More extreme examples could use player data to estimate the strength of each<br />

team's sides to predict who won. These types of <strong>com</strong>plex features are used every<br />

day by gamblers and sports betting agencies to try to turn a profit by predicting the<br />

out<strong>com</strong>e of sports matches.<br />

Summary<br />

In this chapter, we extended our use of scikit-learn's classifiers to perform<br />

classification and introduced the pandas library to manage our data. We analyzed<br />

real-world data on basketball results from the NBA, saw some of the problems that<br />

even well-curated data introduces, and created new features for our analysis.<br />

We saw the effect that good features have on performance and used an ensemble<br />

algorithm, Random forests, to further improve the accuracy.<br />

In the next chapter, we will extend the affinity analysis that we performed in the first<br />

chapter to create a program to find similar books. We will see how to use algorithms<br />

for ranking and also use approximation to improve the scalability of data mining.<br />

[ 59 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!