24.07.2016 Views

www.allitebooks.com

Learning%20Data%20Mining%20with%20Python

Learning%20Data%20Mining%20with%20Python

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Re<strong>com</strong>mending Movies Using Affinity Analysis<br />

Extracting association rules<br />

After the Apriori algorithm has <strong>com</strong>pleted, we have a list of frequent itemsets.<br />

These aren't exactly association rules, but they are similar to it. A frequent itemset<br />

is a set of items with a minimum support, while an association rule has a premise<br />

and a conclusion.<br />

We can make an association rule from a frequent itemset by taking one of the movies<br />

in the itemset and denoting it as the conclusion. The other movies in the itemset will<br />

be the premise. This will form rules of the following form: if a reviewer re<strong>com</strong>mends all<br />

of the movies in the premise, they will also re<strong>com</strong>mend the conclusion.<br />

For each itemset, we can generate a number of association rules by setting each<br />

movie to be the conclusion and the remaining movies as the premise.<br />

In code, we first generate a list of all of the rules from each of the frequent itemsets,<br />

by iterating over each of the discovered frequent itemsets of each length:<br />

candidate_rules = []<br />

for itemset_length, itemset_counts in frequent_itemsets.items():<br />

for itemset in itemset_counts.keys():<br />

We then iterate over every movie in this itemset, using it as our conclusion.<br />

The remaining movies in the itemset are the premise. We save the premise and<br />

conclusion as our candidate rule:<br />

for conclusion in itemset:<br />

premise = itemset - set((conclusion,))<br />

candidate_rules.append((premise, conclusion))<br />

This returns a very large number of candidate rules. We can see some by printing<br />

out the first few rules in the list:<br />

print(candidate_rules[:5])<br />

The resulting output shows the rules that were obtained:<br />

[(frozenset({79}), 258), (frozenset({258}), 79), (frozenset({50}),<br />

64), (frozenset({64}), 50), (frozenset({127}), 181)]<br />

In these rules, the first part (the frozenset) is the list of movies in the premise,<br />

while the number after it is the conclusion. In the first case, if a reviewer<br />

re<strong>com</strong>mends movie 79, they are also likely to re<strong>com</strong>mend movie 258.<br />

Next, we <strong>com</strong>pute the confidence of each of these rules. This is performed much<br />

like in Chapter 1, Getting Started with Data Mining, with the only changes being<br />

those necessary for <strong>com</strong>puting using the new data format.<br />

[ 72 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!