www.allitebooks.com
Learning%20Data%20Mining%20with%20Python Learning%20Data%20Mining%20with%20Python
Chapter 3 The output is as follows: Next, we create a new feature using a similar pattern to the previous feature. We iterate over the rows, looking up the standings for the home team and visitor team. The code is as follows: dataset["HomeTeamRanksHigher"] = 0 for index, row in dataset.iterrows(): home_team = row["Home Team"] visitor_team = row["Visitor Team"] As an important adjustment to the data, a team was renamed between the 2013 and 2014 seasons (but it was still the same team). This is an example of one of the many different things that can happen when trying to integrate data! We will need to adjust the team lookup, ensuring we get the correct team's ranking: if home_team == "New Orleans Pelicans": home_team = "New Orleans Hornets" elif visitor_team == "New Orleans Pelicans": visitor_team = "New Orleans Hornets" Now we can get the rankings for each team. We then compare them and update the feature in the row: home_rank = standings[standings["Team"] == home_team]["Rk"].values[0] visitor_rank = standings[standings["Team"] == visitor_team]["Rk"].values[0] row["HomeTeamRanksHigher"] = int(home_rank > visitor_rank) dataset.ix[index] = row [ 51 ]
Predicting Sports Winners with Decision Trees Next, we use the cross_val_score function to test the result. First, we extract the dataset: X_homehigher = dataset[["HomeLastWin", "VisitorLastWin", "HomeTeamRanksHigher"]].values Then, we create a new DecisionTreeClassifier and run the evaluation: clf = DecisionTreeClassifier(random_state=14) scores = cross_val_score(clf, X_homehigher, y_true, scoring='accuracy') print("Accuracy: {0:.1f}%".format(np.mean(scores) * 100)) This now scores 60.3 percent—even better than our previous result. Can we do better? Next, let's test which of the two teams won their last match. While rankings can give some hints on who won (the higher ranked team is more likely to win), sometimes teams play better against other teams. There are many reasons for this – for example, some teams may have strategies that work against other teams really well. Following our previous pattern, we create a dictionary to store the winner of the past game and create a new feature in our data frame. The code is as follows: last_match_winner = defaultdict(int) dataset["HomeTeamWonLast"] = 0 Then, we iterate over each row and get the home team and visitor team: for index, row in dataset.iterrows(): home_team = row["Home Team"] visitor_team = row["Visitor Team"] We want to see who won the last game between these two teams regardless of which team was playing at home. Therefore, we sort the team names alphabetically, giving us a consistent key for those two teams: teams = tuple(sorted([home_team, visitor_team])) We look up in our dictionary to see who won the last encounter between the two teams. Then, we update the row in the dataset data frame: row["HomeTeamWonLast"] = 1 if last_match_winner[teams] == row["Home Team"] else 0 dataset.ix[index] = row [ 52 ]
- Page 24 and 25: Getting Started with Data Mining We
- Page 26 and 27: Chapter 1 In the preceding dataset,
- Page 28 and 29: After you have the above "Hello, wo
- Page 30 and 31: Chapter 1 Windows users may need to
- Page 32 and 33: Chapter 1 The dataset we are going
- Page 34 and 35: Chapter 1 As an example, we will co
- Page 36 and 37: We get the names of the features fo
- Page 38 and 39: Chapter 1 Two rules are near the to
- Page 40 and 41: Chapter 1 The scikit-learn library
- Page 42 and 43: We then iterate over all the sample
- Page 44 and 45: Chapter 1 Overfitting is the proble
- Page 46: Chapter 1 Summary In this chapter,
- Page 49 and 50: Classifying with scikit-learn Estim
- Page 51 and 52: Classifying with scikit-learn Estim
- Page 53 and 54: Classifying with scikit-learn Estim
- Page 55 and 56: Classifying with scikit-learn Estim
- Page 57 and 58: Classifying with scikit-learn Estim
- Page 59 and 60: Classifying with scikit-learn Estim
- Page 61 and 62: Classifying with scikit-learn Estim
- Page 63 and 64: Classifying with scikit-learn Estim
- Page 65 and 66: Predicting Sports Winners with Deci
- Page 67 and 68: Predicting Sports Winners with Deci
- Page 69 and 70: Predicting Sports Winners with Deci
- Page 71 and 72: Predicting Sports Winners with Deci
- Page 73: Predicting Sports Winners with Deci
- Page 77 and 78: Predicting Sports Winners with Deci
- Page 79 and 80: Predicting Sports Winners with Deci
- Page 81 and 82: Predicting Sports Winners with Deci
- Page 84 and 85: Recommending Movies Using Affinity
- Page 86 and 87: Chapter 4 The classic algorithm for
- Page 88 and 89: Chapter 4 When loading the file, we
- Page 90 and 91: Chapter 4 We will sample our datase
- Page 92 and 93: Chapter 4 Implementation On the fir
- Page 94 and 95: Chapter 4 We want to break out the
- Page 96 and 97: The process starts by creating dict
- Page 98 and 99: movie_name_data.columns = ["MovieID
- Page 100 and 101: To do this, we will compute the tes
- Page 102 and 103: Chapter 4 - Train Confidence: 1.000
- Page 104 and 105: Extracting Features with Transforme
- Page 106 and 107: Chapter 5 Thought should always be
- Page 108 and 109: Chapter 5 Other features describe a
- Page 110 and 111: Chapter 5 Similarly, we can convert
- Page 112 and 113: Chapter 5 [18, 19, 20], [21, 22, 23
- Page 114 and 115: Chapter 5 Next, we create our trans
- Page 116 and 117: Chapter 5 This returns a different
- Page 118 and 119: Also, we want to set the final colu
- Page 120 and 121: Chapter 5 The downside to transform
- Page 122 and 123: Chapter 5 A transformer is akin to
Predicting Sports Winners with Decision Trees<br />
Next, we use the cross_val_score function to test the result. First, we extract<br />
the dataset:<br />
X_homehigher = dataset[["HomeLastWin", "VisitorLastWin",<br />
"HomeTeamRanksHigher"]].values<br />
Then, we create a new DecisionTreeClassifier and run the evaluation:<br />
clf = DecisionTreeClassifier(random_state=14)<br />
scores = cross_val_score(clf, X_homehigher, y_true,<br />
scoring='accuracy')<br />
print("Accuracy: {0:.1f}%".format(np.mean(scores) * 100))<br />
This now scores 60.3 percent—even better than our previous result. Can we<br />
do better?<br />
Next, let's test which of the two teams won their last match. While rankings can give<br />
some hints on who won (the higher ranked team is more likely to win), sometimes<br />
teams play better against other teams. There are many reasons for this – for example,<br />
some teams may have strategies that work against other teams really well. Following<br />
our previous pattern, we create a dictionary to store the winner of the past game and<br />
create a new feature in our data frame. The code is as follows:<br />
last_match_winner = defaultdict(int)<br />
dataset["HomeTeamWonLast"] = 0<br />
Then, we iterate over each row and get the home team and visitor team:<br />
for index, row in dataset.iterrows():<br />
home_team = row["Home Team"]<br />
visitor_team = row["Visitor Team"]<br />
We want to see who won the last game between these two teams regardless of which<br />
team was playing at home. Therefore, we sort the team names alphabetically, giving<br />
us a consistent key for those two teams:<br />
teams = tuple(sorted([home_team, visitor_team]))<br />
We look up in our dictionary to see who won the last encounter between the two<br />
teams. Then, we update the row in the dataset data frame:<br />
row["HomeTeamWonLast"] = 1 if last_match_winner[teams] ==<br />
row["Home Team"] else 0<br />
dataset.ix[index] = row<br />
[ 52 ]