24.07.2016 Views

www.allitebooks.com

Learning%20Data%20Mining%20with%20Python

Learning%20Data%20Mining%20with%20Python

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Re<strong>com</strong>mending Movies Using Affinity Analysis<br />

The format given here represents the full matrix, but in a more <strong>com</strong>pact way.<br />

The first row indicates that user #196 reviewed movie #242, giving it a ranking<br />

of 3 (out of five) on the December 4, 1997.<br />

Any <strong>com</strong>bination of user and movie that isn't in this database is assumed to not exist.<br />

This saves significant space, as opposed to storing a bunch of zeroes in memory. This<br />

type of format is called a sparse matrix format. As a rule of thumb, if you expect<br />

about 60 percent or more of your dataset to be empty or zero, a sparse format will<br />

take less space to store.<br />

When <strong>com</strong>puting on sparse matrices, the focus isn't usually on the data we don't<br />

have—<strong>com</strong>paring all of the zeroes. We usually focus on the data we have and<br />

<strong>com</strong>pare those.<br />

The Apriori implementation<br />

The goal of this chapter is to produce rules of the following form: if a person<br />

re<strong>com</strong>mends these movies, they will also re<strong>com</strong>mend this movie. We will also discuss<br />

extensions where a person re<strong>com</strong>mends a set of movies is likely to re<strong>com</strong>mend<br />

another particular movie.<br />

To do this, we first need to determine if a person re<strong>com</strong>mends a movie. We can<br />

do this by creating a new feature Favorable, which is True if the person gave a<br />

favorable review to a movie:<br />

all_ratings["Favorable"] = all_ratings["Rating"] > 3<br />

We can see the new feature by viewing the dataset:<br />

all_ratings[10:15]<br />

UserID MovieID Rating Datetime Favorable<br />

10 62 257 2 1997-11-12 22:07:14 False<br />

11 286 1014 5 1997-11-17 15:38:45 True<br />

12 200 222 5 1997-10-05 09:05:40 True<br />

13 210 40 3 1998-03-27 21:59:54 False<br />

14 224 29 3 1998-02-21 23:40:57 False<br />

[ 66 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!