www.allitebooks.com

Learning%20Data%20Mining%20with%20Python Learning%20Data%20Mining%20with%20Python

24.07.2016 Views

Index A access keys 107 accuracy improving, dictionary used 180 activation function 162 Adult dataset URL 83 Advertisements dataset URL 93 affinity analysis about 61, 62 algorithms 62, 63 dataset, loading with NumPy 8-10 defining 7 example 7 parameters, selecting 63 product recommendations 8 ranking of rules, implementing 10-13 ranking, to find best rules 13-15 Amazon S3 console URL 293 API endpoint URL 213 application about 263, 279 data, obtaining 263-282 defining 126 dictionaries, converting to matrix 127 Naive Bayes algorithm 282 Naive Bayes classifier, training 127 neural network, creating 266 neural network, training with training dataset 267, 268 word counts, extracting 126 apps, Twitter account URL 107 Apriori algorithm 68 Apriori implementation about 66, 67 Apriori algorithm 68 defining 69-71 arbitrary websites data mining, using 220 HTML file, parsing 221, 222 nodes, ignoring 221 stories, finding 218-220 text, extracting from 218-221 Artificial Neural Networks 162, 163 association rules evaluating 76-79 extracting 72-76 authorship analysis about 187 applications 186, 187 authorship, attributing 187, 188 data, obtaining 189-191 defining 186 use cases 186, 187 authorship analysis, problems authorship clustering 186 authorship profiling 186 authorship verification 186 [ 309 ]

authorship, attributing 185-188 AWS CLI installing 294 AWS console URL 260 B back propagation (backprop) algorithm 173, 174 bagging 55 BatchIterator instance creating 265 Bayes' theorem about 122, 123 equation 122 bias 55 big data about 272 use cases 273, 274 Bleeding Edge code installing 299 URL 299 blog posts extracting 283, 284 blogs dataset 304 C CAPTCHA creating 167 defining 303 references 303 CART (Classification and Regression Trees) 48 character n-grams about 198, 199 extracting 199, 200 CIFAR-10 about 243 URL 243 classification about 16 algorithm, testing 20-22 dataset, loading 16, 17 dataset, preparing 16, 17 examples 16 OneR algorithm, implementing 18, 19 classifiers comparing 299 closed problem 187 cluster evaluation URL 226 clustering 222 coassociation matrix defining 230, 231 complex algorithms references 303 complex features references 300 confidence about 11 computing 11 connected components 151-153 Cosine distance 28 Coursera about 308 references 308 CPU defining 258 cross-fold validation framework defining 32 CSV (Comma Separated Values) 42 D data, blogging URL 280 data, Corpus URL 280 data mining defining 2, 3 dataset about 2 CAPTCHAs, drawing 166, 167 classifying, with existing model 137-139 cleaning up 44 creating 165 data, collecting 42 [ 310 ]

authorship, attributing 185-188<br />

AWS CLI<br />

installing 294<br />

AWS console<br />

URL 260<br />

B<br />

back propagation (backprop)<br />

algorithm 173, 174<br />

bagging 55<br />

BatchIterator instance<br />

creating 265<br />

Bayes' theorem<br />

about 122, 123<br />

equation 122<br />

bias 55<br />

big data<br />

about 272<br />

use cases 273, 274<br />

Bleeding Edge code<br />

installing 299<br />

URL 299<br />

blog posts<br />

extracting 283, 284<br />

blogs dataset 304<br />

C<br />

CAPTCHA<br />

creating 167<br />

defining 303<br />

references 303<br />

CART (Classification and<br />

Regression Trees) 48<br />

character n-grams<br />

about 198, 199<br />

extracting 199, 200<br />

CIFAR-10<br />

about 243<br />

URL 243<br />

classification<br />

about 16<br />

algorithm, testing 20-22<br />

dataset, loading 16, 17<br />

dataset, preparing 16, 17<br />

examples 16<br />

OneR algorithm, implementing 18, 19<br />

classifiers<br />

<strong>com</strong>paring 299<br />

closed problem 187<br />

cluster evaluation<br />

URL 226<br />

clustering 222<br />

coassociation matrix<br />

defining 230, 231<br />

<strong>com</strong>plex algorithms<br />

references 303<br />

<strong>com</strong>plex features<br />

references 300<br />

confidence<br />

about 11<br />

<strong>com</strong>puting 11<br />

connected <strong>com</strong>ponents 151-153<br />

Cosine distance 28<br />

Coursera<br />

about 308<br />

references 308<br />

CPU<br />

defining 258<br />

cross-fold validation framework<br />

defining 32<br />

CSV (Comma Separated Values) 42<br />

D<br />

data, blogging<br />

URL 280<br />

data, Corpus<br />

URL 280<br />

data mining<br />

defining 2, 3<br />

dataset<br />

about 2<br />

CAPTCHAs, drawing 166, 167<br />

classifying, with existing model 137-139<br />

cleaning up 44<br />

creating 165<br />

data, collecting 42<br />

[ 310 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!