www.allitebooks.com
Learning%20Data%20Mining%20with%20Python Learning%20Data%20Mining%20with%20Python
Index A access keys 107 accuracy improving, dictionary used 180 activation function 162 Adult dataset URL 83 Advertisements dataset URL 93 affinity analysis about 61, 62 algorithms 62, 63 dataset, loading with NumPy 8-10 defining 7 example 7 parameters, selecting 63 product recommendations 8 ranking of rules, implementing 10-13 ranking, to find best rules 13-15 Amazon S3 console URL 293 API endpoint URL 213 application about 263, 279 data, obtaining 263-282 defining 126 dictionaries, converting to matrix 127 Naive Bayes algorithm 282 Naive Bayes classifier, training 127 neural network, creating 266 neural network, training with training dataset 267, 268 word counts, extracting 126 apps, Twitter account URL 107 Apriori algorithm 68 Apriori implementation about 66, 67 Apriori algorithm 68 defining 69-71 arbitrary websites data mining, using 220 HTML file, parsing 221, 222 nodes, ignoring 221 stories, finding 218-220 text, extracting from 218-221 Artificial Neural Networks 162, 163 association rules evaluating 76-79 extracting 72-76 authorship analysis about 187 applications 186, 187 authorship, attributing 187, 188 data, obtaining 189-191 defining 186 use cases 186, 187 authorship analysis, problems authorship clustering 186 authorship profiling 186 authorship verification 186 [ 309 ]
authorship, attributing 185-188 AWS CLI installing 294 AWS console URL 260 B back propagation (backprop) algorithm 173, 174 bagging 55 BatchIterator instance creating 265 Bayes' theorem about 122, 123 equation 122 bias 55 big data about 272 use cases 273, 274 Bleeding Edge code installing 299 URL 299 blog posts extracting 283, 284 blogs dataset 304 C CAPTCHA creating 167 defining 303 references 303 CART (Classification and Regression Trees) 48 character n-grams about 198, 199 extracting 199, 200 CIFAR-10 about 243 URL 243 classification about 16 algorithm, testing 20-22 dataset, loading 16, 17 dataset, preparing 16, 17 examples 16 OneR algorithm, implementing 18, 19 classifiers comparing 299 closed problem 187 cluster evaluation URL 226 clustering 222 coassociation matrix defining 230, 231 complex algorithms references 303 complex features references 300 confidence about 11 computing 11 connected components 151-153 Cosine distance 28 Coursera about 308 references 308 CPU defining 258 cross-fold validation framework defining 32 CSV (Comma Separated Values) 42 D data, blogging URL 280 data, Corpus URL 280 data mining defining 2, 3 dataset about 2 CAPTCHAs, drawing 166, 167 classifying, with existing model 137-139 cleaning up 44 creating 165 data, collecting 42 [ 310 ]
- Page 282 and 283: Chapter 11 Getting your code to run
- Page 284 and 285: Chapter 11 Setting up the environme
- Page 286 and 287: This will unzip only one Coval.otf
- Page 288 and 289: Chapter 11 First we create the laye
- Page 290 and 291: Chapter 11 Finally, we set the verb
- Page 292: Chapter 11 Summary In this chapter,
- Page 295 and 296: Working with Big Data Big data What
- Page 297 and 298: Working with Big Data Governments a
- Page 299 and 300: Working with Big Data We start by c
- Page 301 and 302: Working with Big Data The final ste
- Page 303 and 304: Working with Big Data Getting the d
- Page 305 and 306: Working with Big Data If we aren't
- Page 307 and 308: Working with Big Data Before we sta
- Page 309 and 310: Working with Big Data The first val
- Page 311 and 312: Working with Big Data This gives us
- Page 313 and 314: Working with Big Data Next, we crea
- Page 315 and 316: Working with Big Data Then, make a
- Page 317 and 318: Working with Big Data Left-click th
- Page 319 and 320: Working with Big Data The result is
- Page 321 and 322: Next Steps… Extending the IPython
- Page 323 and 324: Next Steps… Chapter 3: Predicting
- Page 325 and 326: Next Steps… Vowpal Wabbit http://
- Page 327 and 328: Next Steps… Deeper networks These
- Page 329 and 330: Next Steps… Real-time clusterings
- Page 331: Next Steps… More resources Kaggle
- Page 335 and 336: feature extraction about 82 common
- Page 337 and 338: NetworkX about 145 defining 303 URL
- Page 339 and 340: scikit-learn package references 305
- Page 342 and 343: Thank you for buying Learning Data
- Page 344: Learning Python Data Visualization
authorship, attributing 185-188<br />
AWS CLI<br />
installing 294<br />
AWS console<br />
URL 260<br />
B<br />
back propagation (backprop)<br />
algorithm 173, 174<br />
bagging 55<br />
BatchIterator instance<br />
creating 265<br />
Bayes' theorem<br />
about 122, 123<br />
equation 122<br />
bias 55<br />
big data<br />
about 272<br />
use cases 273, 274<br />
Bleeding Edge code<br />
installing 299<br />
URL 299<br />
blog posts<br />
extracting 283, 284<br />
blogs dataset 304<br />
C<br />
CAPTCHA<br />
creating 167<br />
defining 303<br />
references 303<br />
CART (Classification and<br />
Regression Trees) 48<br />
character n-grams<br />
about 198, 199<br />
extracting 199, 200<br />
CIFAR-10<br />
about 243<br />
URL 243<br />
classification<br />
about 16<br />
algorithm, testing 20-22<br />
dataset, loading 16, 17<br />
dataset, preparing 16, 17<br />
examples 16<br />
OneR algorithm, implementing 18, 19<br />
classifiers<br />
<strong>com</strong>paring 299<br />
closed problem 187<br />
cluster evaluation<br />
URL 226<br />
clustering 222<br />
coassociation matrix<br />
defining 230, 231<br />
<strong>com</strong>plex algorithms<br />
references 303<br />
<strong>com</strong>plex features<br />
references 300<br />
confidence<br />
about 11<br />
<strong>com</strong>puting 11<br />
connected <strong>com</strong>ponents 151-153<br />
Cosine distance 28<br />
Coursera<br />
about 308<br />
references 308<br />
CPU<br />
defining 258<br />
cross-fold validation framework<br />
defining 32<br />
CSV (Comma Separated Values) 42<br />
D<br />
data, blogging<br />
URL 280<br />
data, Corpus<br />
URL 280<br />
data mining<br />
defining 2, 3<br />
dataset<br />
about 2<br />
CAPTCHAs, drawing 166, 167<br />
classifying, with existing model 137-139<br />
cleaning up 44<br />
creating 165<br />
data, collecting 42<br />
[ 310 ]