www.allitebooks.com
Learning%20Data%20Mining%20with%20Python Learning%20Data%20Mining%20with%20Python
The process starts by creating dictionaries to store how many times we see the premise leading to the conclusion (a correct example of the rule) and how many times it doesn't (an incorrect example). Let's look at the code: correct_counts = defaultdict(int) incorrect_counts = defaultdict(int) We iterate over all of the users, their favorable reviews, and over each candidate association rule: for user, reviews in favorable_reviews_by_users.items(): for candidate_rule in candidate_rules: premise, conclusion = candidate_rule Chapter 4 We then test to see if the premise is applicable to this user. In other words, did the user favorably review all of the movies in the premise? Let's look at the code: if premise.issubset(reviews): If the premise applies, we see if the conclusion movie was also rated favorably. If so, the rule is correct in this instance. If not, it is incorrect. Let's look at the code: if premise.issubset(reviews): if conclusion in reviews: correct_counts[candidate_rule] += 1 else: incorrect_counts[candidate_rule] += 1 We then compute the confidence for each rule by dividing the correct count by the total number of times the rule was seen: rule_confidence = {candidate_rule: correct_counts[candidate_rule] / float(correct_counts[candidate_rule] + incorrect_counts[candidate_rule]) for candidate_rule in candidate_rules} Now we can print the top five rules by sorting this confidence dictionary and printing the results: from operator import itemgetter sorted_confidence = sorted(rule_confidence.items(), key=itemgetter(1), reverse=True) for index in range(5): print("Rule #{0}".format(index + 1)) (premise, conclusion) = sorted_confidence[index][0] [ 73 ]
Recommending Movies Using Affinity Analysis print("Rule: If a person recommends {0} they will also recommend {1}".format(premise, conclusion)) print(" - Confidence: {0:.3f}".format(rule_confidence[(premise, conclusion)])) print("") The result is as follows: Rule #1 Rule: If a person recommends frozenset({64, 56, 98, 50, 7}) they will also recommend 174 - Confidence: 1.000 Rule #2 Rule: If a person recommends frozenset({98, 100, 172, 79, 50, 56}) they will also recommend 7 - Confidence: 1.000 Rule #3 Rule: If a person recommends frozenset({98, 172, 181, 174, 7}) they will also recommend 50 - Confidence: 1.000 Rule #4 Rule: If a person recommends frozenset({64, 98, 100, 7, 172, 50}) they will also recommend 174 - Confidence: 1.000 Rule #5 Rule: If a person recommends frozenset({64, 1, 7, 172, 79, 50}) they will also recommend 181 - Confidence: 1.000 The resulting printout shows only the movie IDs, which isn't very helpful without the names of the movies also. The dataset came with a file called u.items, which stores the movie names and their corresponding MovieID (as well as other information, such as the genre). We can load the titles from this file using pandas. Additional information about the file and categories is available in the README that came with the dataset. The data in the files is in CSV format, but with data separated by the | symbol; it has no header and the encoding is important to set. The column names were found in the README file. movie_name_filename = os.path.join(data_folder, "u.item") movie_name_data = pd.read_csv(movie_name_filename, delimiter="|", header=None, encoding = "mac-roman") [ 74 ]
- Page 46: Chapter 1 Summary In this chapter,
- Page 49 and 50: Classifying with scikit-learn Estim
- Page 51 and 52: Classifying with scikit-learn Estim
- Page 53 and 54: Classifying with scikit-learn Estim
- Page 55 and 56: Classifying with scikit-learn Estim
- Page 57 and 58: Classifying with scikit-learn Estim
- Page 59 and 60: Classifying with scikit-learn Estim
- Page 61 and 62: Classifying with scikit-learn Estim
- Page 63 and 64: Classifying with scikit-learn Estim
- Page 65 and 66: Predicting Sports Winners with Deci
- Page 67 and 68: Predicting Sports Winners with Deci
- Page 69 and 70: Predicting Sports Winners with Deci
- Page 71 and 72: Predicting Sports Winners with Deci
- Page 73 and 74: Predicting Sports Winners with Deci
- Page 75 and 76: Predicting Sports Winners with Deci
- Page 77 and 78: Predicting Sports Winners with Deci
- Page 79 and 80: Predicting Sports Winners with Deci
- Page 81 and 82: Predicting Sports Winners with Deci
- Page 84 and 85: Recommending Movies Using Affinity
- Page 86 and 87: Chapter 4 The classic algorithm for
- Page 88 and 89: Chapter 4 When loading the file, we
- Page 90 and 91: Chapter 4 We will sample our datase
- Page 92 and 93: Chapter 4 Implementation On the fir
- Page 94 and 95: Chapter 4 We want to break out the
- Page 98 and 99: movie_name_data.columns = ["MovieID
- Page 100 and 101: To do this, we will compute the tes
- Page 102 and 103: Chapter 4 - Train Confidence: 1.000
- Page 104 and 105: Extracting Features with Transforme
- Page 106 and 107: Chapter 5 Thought should always be
- Page 108 and 109: Chapter 5 Other features describe a
- Page 110 and 111: Chapter 5 Similarly, we can convert
- Page 112 and 113: Chapter 5 [18, 19, 20], [21, 22, 23
- Page 114 and 115: Chapter 5 Next, we create our trans
- Page 116 and 117: Chapter 5 This returns a different
- Page 118 and 119: Also, we want to set the final colu
- Page 120 and 121: Chapter 5 The downside to transform
- Page 122 and 123: Chapter 5 A transformer is akin to
- Page 124 and 125: We can then create an instance of t
- Page 126: Chapter 5 Putting it all together N
- Page 129 and 130: Social Media Insight Using Naive Ba
- Page 131 and 132: Social Media Insight Using Naive Ba
- Page 133 and 134: Social Media Insight Using Naive Ba
- Page 135 and 136: Social Media Insight Using Naive Ba
- Page 137 and 138: Social Media Insight Using Naive Ba
- Page 139 and 140: Social Media Insight Using Naive Ba
- Page 141 and 142: Social Media Insight Using Naive Ba
- Page 143 and 144: Social Media Insight Using Naive Ba
- Page 145 and 146: Social Media Insight Using Naive Ba
Re<strong>com</strong>mending Movies Using Affinity Analysis<br />
print("Rule: If a person re<strong>com</strong>mends {0} they will also<br />
re<strong>com</strong>mend {1}".format(premise, conclusion))<br />
print(" - Confidence:<br />
{0:.3f}".format(rule_confidence[(premise, conclusion)]))<br />
print("")<br />
The result is as follows:<br />
Rule #1<br />
Rule: If a person re<strong>com</strong>mends frozenset({64, 56, 98, 50, 7}) they will<br />
also re<strong>com</strong>mend 174<br />
- Confidence: 1.000<br />
Rule #2<br />
Rule: If a person re<strong>com</strong>mends frozenset({98, 100, 172, 79, 50, 56})<br />
they will also re<strong>com</strong>mend 7<br />
- Confidence: 1.000<br />
Rule #3<br />
Rule: If a person re<strong>com</strong>mends frozenset({98, 172, 181, 174, 7}) they<br />
will also re<strong>com</strong>mend 50<br />
- Confidence: 1.000<br />
Rule #4<br />
Rule: If a person re<strong>com</strong>mends frozenset({64, 98, 100, 7, 172, 50}) they<br />
will also re<strong>com</strong>mend 174<br />
- Confidence: 1.000<br />
Rule #5<br />
Rule: If a person re<strong>com</strong>mends frozenset({64, 1, 7, 172, 79, 50}) they<br />
will also re<strong>com</strong>mend 181<br />
- Confidence: 1.000<br />
The resulting printout shows only the movie IDs, which isn't very helpful without<br />
the names of the movies also. The dataset came with a file called u.items, which<br />
stores the movie names and their corresponding MovieID (as well as other<br />
information, such as the genre).<br />
We can load the titles from this file using pandas. Additional information about<br />
the file and categories is available in the README that came with the dataset.<br />
The data in the files is in CSV format, but with data separated by the | symbol;<br />
it has no header and the encoding is important to set. The column names were<br />
found in the README file.<br />
movie_name_filename = os.path.join(data_folder, "u.item")<br />
movie_name_data = pd.read_csv(movie_name_filename, delimiter="|",<br />
header=None, encoding = "mac-roman")<br />
[ 74 ]