05.11.2012 Views

CMPT-741 Fall 2009 Data Mining Martin Ester Course Project

CMPT-741 Fall 2009 Data Mining Martin Ester Course Project

CMPT-741 Fall 2009 Data Mining Martin Ester Course Project

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Total marks: 30 % of the class<br />

Due date: November 30, <strong>2009</strong><br />

Introduction<br />

<strong>CMPT</strong>-<strong>741</strong> <strong>Fall</strong> <strong>2009</strong><br />

<strong>Data</strong> <strong>Mining</strong><br />

<strong>Martin</strong> <strong>Ester</strong><br />

<strong>Course</strong> <strong>Project</strong><br />

This course project is to be conducted in small groups of two to three students. You will be using<br />

the public domain data mining tool WEKA, to be downloaded from<br />

http://www.cs.waikato.ac.nz/ml/weka/index_downloading.html.<br />

<strong>Data</strong>set<br />

You will be using reviews from Epinions.com. Each review consists of the full text, the pros and<br />

cons (a summary of the review) and the numerical rating (1, 2, 3, 4, or 5). The texts are provided<br />

both as raw texts and in a version with part-of-speech tags. More specifically, here are the<br />

attributes of a product review:<br />

PKey|ProductRate|Pros|Cons|FullReview|TaggedPros|TaggedCons|TaggedFullReview<br />

Our dataset contains reviews for each of the following product categories and products:<br />

Camcorder<br />

Pkey Product<br />

1 Pure Digital Flip Video Ultra (2 GB) Camcorder<br />

2 Canon Elura 100 Mini DV Camcorder<br />

3 Canon ZR40 Mini DV Camcorder<br />

4 Canon GL2 Mini DV Camcorder<br />

5 Samsung SC-D23 Mini DV Camcorder<br />

6 Sony Handycam DCR-TRV250 Digital-8 Camcorder<br />

7 Panasonic Palmcorder® PV-DV203 Mini DV Camcorder<br />

8 Canon Elura 50 Mini DV Camcorder


Cellular Phone<br />

Digital Camera<br />

DVD Player<br />

MP3 Player<br />

Pkey Product<br />

1 Apple iPhone (8 GB) Smartphone<br />

2 Nokia 6210 Cell Phone<br />

3 Samsung SPH-A500 Cell Phone<br />

4 RIM BlackBerry Pearl 8100 Smartphone<br />

5 Palm Treo 650 Smartphone<br />

6 Sony Ericsson T68IS Cell Phone<br />

7 Motorola V400 Cell Phone<br />

8 Motorola Qâ„¢ Smartphone<br />

Pkey Product<br />

1 Sony Mavica MVC-FD83 Digital Camera<br />

2 Olympus Camedia D-360L Digital Camera<br />

3 Nikon D40 Digital Camera with 18-55mm Lens<br />

4 Canon PowerShot® S5 IS Digital Camera<br />

5 Kodak EasyShare LS443 Digital Camera<br />

6 Hewlett Packard Photosmart 435 Digital Camera<br />

7 BenQ DC1500 Digital Camera<br />

8 Agfa ePhoto Smile Digital Camera<br />

Pkey Product<br />

1 Panasonic DVD-LV50 5 in. Portable DVD Player<br />

2 Toshiba SD-1200 DVD Player<br />

3 Panasonic DMR-E85H (120 GB) DVD Recorder<br />

4 Toshiba RD-XS32SU DVD Recorder / HDD Recorder<br />

5 Apex Digital AD-1200 DVD Player<br />

6 Initial IDM-1731 7 in. Portable DVD Player<br />

7 Cyberhome DVR 1600 DVD Recorder<br />

8 Cyberhome CH-LDV 712 7 in. Portable DVD Player<br />

Pkey Product<br />

1 Apple iPod touch 2nd Generation (16 GB) MP3 Player<br />

2 Intel Pocket Concert (128 MB) MP3 Player<br />

3 Apple iPod classic 5th Generation White (30 GB) MP3 Player<br />

4 Apple iPod classic 4th Generation (20 GB) MP3 Player<br />

5 Apple iPod shuffle 1st Generation White (512 MB, M9724LL/A) MP3 Player


6 SanDisk Sansa m240 (1 GB) MP3 Player<br />

7 Dell DJ (20 GB) MP3 Player<br />

8 RCA Lyra RD2840 (40 GB) MP3 Player<br />

The dataset contains multiple reviews for each product.<br />

Tasks<br />

Your task is sentiment classification. We create class labels for reviews as follows:<br />

Rating = 1 or 2: negative, rating = 3: neutral, rating = 4 or 5: positive.<br />

Build a sentiment classifier using any of the methods implemented in WEKA. Submit a hard<br />

copy report addressing the following issues:<br />

1) Describe the features that you used for your classifier as well as why and how you selected<br />

them.<br />

2) Discuss your choice of a classification method. You may have to experiment with several<br />

methods and may report some of the preliminary results as arguments.<br />

3) Give the classification accuracy, the precision and recall and the F-measure (for the positive<br />

class and for the negative class) of your sentiment classifier. You need to perform 5-fold<br />

cross-validation to compute your performance measures.<br />

4) Design and implement a method that explains the results of your sentiment classifier. Discuss<br />

your design and provide the actual explanations produced for the classification of the first<br />

reviews of each product category.<br />

Marking scheme<br />

Each of the four issues listed above is worth 25% of the course project marks.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!