24.07.2016 Views

www.allitebooks.com

Learning%20Data%20Mining%20with%20Python

Learning%20Data%20Mining%20with%20Python

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Predicting Sports Winners with Decision Trees<br />

There are many possible features we could use, but we will try the following<br />

questions:<br />

• Which team is considered better generally?<br />

• Which team won their last encounter?<br />

We will also try putting the raw teams into the algorithm to check whether the<br />

algorithm can learn a model that checks how different teams play against each other.<br />

Putting it all together<br />

For the first feature, we will create a feature that tells us if the home team is generally<br />

better than the visitors. To do this, we will load the standings (also called a ladder in<br />

some sports) from the NBA in the previous season. A team will be considered better<br />

if it ranked higher in 2013 than the other team.<br />

To obtain the standings data, perform the following steps:<br />

1. Navigate to http://<strong>www</strong>.basketball-reference.<strong>com</strong>/leagues/NBA_2013_<br />

standings.html in your web browser.<br />

2. Select Expanded Standings to get a single list for the entire league.<br />

3. Click on the Export link.<br />

4. Save the downloaded file in your data folder.<br />

Back in your IPython Notebook, enter the following lines into a new cell. You'll need<br />

to ensure that the file was saved into the location pointed to by the data_folder<br />

variable. The code is as follows:<br />

standings_filename = os.path.join(data_folder,<br />

"leagues_NBA_2013_standings_expanded-standings.csv")<br />

standings = pd.read_csv(standings_filename, skiprows=[0,1])<br />

You can view the ladder by just typing standings into a new cell and running<br />

the code:<br />

Standings<br />

[ 50 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!