24.07.2016 Views

www.allitebooks.com

Learning%20Data%20Mining%20with%20Python

Learning%20Data%20Mining%20with%20Python

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 3<br />

You can change those indices to look at other parts of the data, as there are over 1000<br />

games in our dataset!<br />

Currently, this gives a false value to all teams (including the previous year's<br />

champion!) when they are first seen. We could improve this feature using the<br />

previous year's data, but will not do that in this chapter.<br />

Decision trees<br />

Decision trees are a class of supervised learning algorithm like a flow chart that<br />

consists of a sequence of nodes, where the values for a sample are used to make a<br />

decision on the next node to go to.<br />

As with most classification algorithms, there are two <strong>com</strong>ponents:<br />

• The first is the training stage, where a tree is built using training data.<br />

While the nearest neighbor algorithm from the previous chapter did not<br />

have a training phase, it is needed for decision trees. In this way, the nearest<br />

neighbor algorithm is a lazy learner, only doing any work when it needs<br />

to make a prediction. In contrast, decision trees, like most classification<br />

methods, are eager learners, undertaking work at the training stage.<br />

• The second is the predicting stage, where the trained tree is used to<br />

predict the classification of new samples. Using the previous example<br />

tree, a data point of ["is raining", "very windy"] would be classed<br />

as "bad weather".<br />

[ 47 ]<br />

<strong>www</strong>.<strong>allitebooks</strong>.<strong>com</strong>

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!