www.allitebooks.com

Learning%20Data%20Mining%20with%20Python Learning%20Data%20Mining%20with%20Python

24.07.2016 Views

Chapter 3 The output is as follows: Next, we create a new feature using a similar pattern to the previous feature. We iterate over the rows, looking up the standings for the home team and visitor team. The code is as follows: dataset["HomeTeamRanksHigher"] = 0 for index, row in dataset.iterrows(): home_team = row["Home Team"] visitor_team = row["Visitor Team"] As an important adjustment to the data, a team was renamed between the 2013 and 2014 seasons (but it was still the same team). This is an example of one of the many different things that can happen when trying to integrate data! We will need to adjust the team lookup, ensuring we get the correct team's ranking: if home_team == "New Orleans Pelicans": home_team = "New Orleans Hornets" elif visitor_team == "New Orleans Pelicans": visitor_team = "New Orleans Hornets" Now we can get the rankings for each team. We then compare them and update the feature in the row: home_rank = standings[standings["Team"] == home_team]["Rk"].values[0] visitor_rank = standings[standings["Team"] == visitor_team]["Rk"].values[0] row["HomeTeamRanksHigher"] = int(home_rank > visitor_rank) dataset.ix[index] = row [ 51 ]

Predicting Sports Winners with Decision Trees Next, we use the cross_val_score function to test the result. First, we extract the dataset: X_homehigher = dataset[["HomeLastWin", "VisitorLastWin", "HomeTeamRanksHigher"]].values Then, we create a new DecisionTreeClassifier and run the evaluation: clf = DecisionTreeClassifier(random_state=14) scores = cross_val_score(clf, X_homehigher, y_true, scoring='accuracy') print("Accuracy: {0:.1f}%".format(np.mean(scores) * 100)) This now scores 60.3 percent—even better than our previous result. Can we do better? Next, let's test which of the two teams won their last match. While rankings can give some hints on who won (the higher ranked team is more likely to win), sometimes teams play better against other teams. There are many reasons for this – for example, some teams may have strategies that work against other teams really well. Following our previous pattern, we create a dictionary to store the winner of the past game and create a new feature in our data frame. The code is as follows: last_match_winner = defaultdict(int) dataset["HomeTeamWonLast"] = 0 Then, we iterate over each row and get the home team and visitor team: for index, row in dataset.iterrows(): home_team = row["Home Team"] visitor_team = row["Visitor Team"] We want to see who won the last game between these two teams regardless of which team was playing at home. Therefore, we sort the team names alphabetically, giving us a consistent key for those two teams: teams = tuple(sorted([home_team, visitor_team])) We look up in our dictionary to see who won the last encounter between the two teams. Then, we update the row in the dataset data frame: row["HomeTeamWonLast"] = 1 if last_match_winner[teams] == row["Home Team"] else 0 dataset.ix[index] = row [ 52 ]

Predicting Sports Winners with Decision Trees<br />

Next, we use the cross_val_score function to test the result. First, we extract<br />

the dataset:<br />

X_homehigher = dataset[["HomeLastWin", "VisitorLastWin",<br />

"HomeTeamRanksHigher"]].values<br />

Then, we create a new DecisionTreeClassifier and run the evaluation:<br />

clf = DecisionTreeClassifier(random_state=14)<br />

scores = cross_val_score(clf, X_homehigher, y_true,<br />

scoring='accuracy')<br />

print("Accuracy: {0:.1f}%".format(np.mean(scores) * 100))<br />

This now scores 60.3 percent—even better than our previous result. Can we<br />

do better?<br />

Next, let's test which of the two teams won their last match. While rankings can give<br />

some hints on who won (the higher ranked team is more likely to win), sometimes<br />

teams play better against other teams. There are many reasons for this – for example,<br />

some teams may have strategies that work against other teams really well. Following<br />

our previous pattern, we create a dictionary to store the winner of the past game and<br />

create a new feature in our data frame. The code is as follows:<br />

last_match_winner = defaultdict(int)<br />

dataset["HomeTeamWonLast"] = 0<br />

Then, we iterate over each row and get the home team and visitor team:<br />

for index, row in dataset.iterrows():<br />

home_team = row["Home Team"]<br />

visitor_team = row["Visitor Team"]<br />

We want to see who won the last game between these two teams regardless of which<br />

team was playing at home. Therefore, we sort the team names alphabetically, giving<br />

us a consistent key for those two teams:<br />

teams = tuple(sorted([home_team, visitor_team]))<br />

We look up in our dictionary to see who won the last encounter between the two<br />

teams. Then, we update the row in the dataset data frame:<br />

row["HomeTeamWonLast"] = 1 if last_match_winner[teams] ==<br />

row["Home Team"] else 0<br />

dataset.ix[index] = row<br />

[ 52 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!