www.allitebooks.com
Learning%20Data%20Mining%20with%20Python Learning%20Data%20Mining%20with%20Python
preprocessing, using pipelines about 35, 36 example 36, 37 features 35 features, of animal 35 standard preprocessing 37 workflow, creating 38 pricing alerts URL 295 Principal Component Analysis (PCA) 96, 97 prior belief 122 probabilistic graphical models URL 308 probabilities computing 290 programmers, for Python language URL 4 Project Gutenberg URL 189 Pydoop about 307 URL 307 Pylearn2 about 306 URL 306 Python defining 106 installing 3, 4 URL 3 using 3 Python 3.4 3 Q quotequail package 205 R RandomForestClassifier 56 random forests about 26 applying 56, 57 defining 54, 55 ensembles, working 55 new features, engineering 58 parameters 56 reasons, feature selection complexity, reducing 88 noise, reducing 88 readable models, creating 88 recall 129 recommendation engine building 307 URL 307 reddit about 212-215 references 213 URL 215 regularization URL 97 reinforcement learning URL 304 RESTful interface (Representational State Transfer) 213 rules confidence 10 finding 13 support 10 S sample size increasing 304 scikit-learn installing 6 URL 7 scikit-learn estimators algorithm, running 32, 33 dataset, loading 29-31 defining 25, 26 distance metrics 27, 28 fit() function 31 Nearest neighbors 26, 27 parameters, setting 33-35 predict() 25 predict() function 31 standard workflow, defining 31 scikit-learn library about 25 estimators 25 pipelines 25 transformers 25 [ 315 ]
scikit-learn package references 305 sepal length 16 sepal width 16 shapes adding, CAPTCHAs URL 303 Silhouette Coefficient about 155 computing 155 parameters 158 similarity graph creating 147-151 SNAP URL 303 softmax nonlinearity 252 Spam detection references 302 spam filter 129 sparse matrix 27 sparse matrix format 66 sports outcome prediction about 49 features 50 stacking 53 standings loading 50-54 standings data obtaining 50 URL 50 Stratified K Fold 32 style sheets 218 stylometry 186 subgraphs connected components 151-153 criteria, optimizing 155-159 finding 151 subreddits 212, 215 SVMs about 196 classifying with 197 kernels 198 URL 197 system building, for taking image as input 242-244 T temporal analysis 305 text about 106 extracting, from arbitrary websites 218 text-based datasets 105 text transformers bag-of-words model 118-120 defining 118 features 121 n-grams 120 word, counting in dataset 118-120 tf-idf 120 Theano about 248, 249 URL 260 using 249 Torch URL 306 train_feature_value() function 20 transformer creating 98 implementing 99, 100 transformer API 99 unit testing 101, 102 tutorial, Google URL 307 tutorial, Yahoo URL 307 tweets about 106 F1-score, used for evaluation 129, 130 features, obtaining from models 130, 132 loading 128, 129 Twitter follower information, obtaining from 140, 141 Twitter account URL 107 twitter documentation URL 107 [ 316 ]
- Page 288 and 289: Chapter 11 First we create the laye
- Page 290 and 291: Chapter 11 Finally, we set the verb
- Page 292: Chapter 11 Summary In this chapter,
- Page 295 and 296: Working with Big Data Big data What
- Page 297 and 298: Working with Big Data Governments a
- Page 299 and 300: Working with Big Data We start by c
- Page 301 and 302: Working with Big Data The final ste
- Page 303 and 304: Working with Big Data Getting the d
- Page 305 and 306: Working with Big Data If we aren't
- Page 307 and 308: Working with Big Data Before we sta
- Page 309 and 310: Working with Big Data The first val
- Page 311 and 312: Working with Big Data This gives us
- Page 313 and 314: Working with Big Data Next, we crea
- Page 315 and 316: Working with Big Data Then, make a
- Page 317 and 318: Working with Big Data Left-click th
- Page 319 and 320: Working with Big Data The result is
- Page 321 and 322: Next Steps… Extending the IPython
- Page 323 and 324: Next Steps… Chapter 3: Predicting
- Page 325 and 326: Next Steps… Vowpal Wabbit http://
- Page 327 and 328: Next Steps… Deeper networks These
- Page 329 and 330: Next Steps… Real-time clusterings
- Page 331 and 332: Next Steps… More resources Kaggle
- Page 333 and 334: authorship, attributing 185-188 AWS
- Page 335 and 336: feature extraction about 82 common
- Page 337: NetworkX about 145 defining 303 URL
- Page 342 and 343: Thank you for buying Learning Data
- Page 344: Learning Python Data Visualization
preprocessing, using pipelines<br />
about 35, 36<br />
example 36, 37<br />
features 35<br />
features, of animal 35<br />
standard preprocessing 37<br />
workflow, creating 38<br />
pricing alerts<br />
URL 295<br />
Principal Component Analysis (PCA) 96, 97<br />
prior belief 122<br />
probabilistic graphical models<br />
URL 308<br />
probabilities<br />
<strong>com</strong>puting 290<br />
programmers, for Python language<br />
URL 4<br />
Project Gutenberg<br />
URL 189<br />
Pydoop<br />
about 307<br />
URL 307<br />
Pylearn2<br />
about 306<br />
URL 306<br />
Python<br />
defining 106<br />
installing 3, 4<br />
URL 3<br />
using 3<br />
Python 3.4 3<br />
Q<br />
quotequail package 205<br />
R<br />
RandomForestClassifier 56<br />
random forests<br />
about 26<br />
applying 56, 57<br />
defining 54, 55<br />
ensembles, working 55<br />
new features, engineering 58<br />
parameters 56<br />
reasons, feature selection<br />
<strong>com</strong>plexity, reducing 88<br />
noise, reducing 88<br />
readable models, creating 88<br />
recall 129<br />
re<strong>com</strong>mendation engine<br />
building 307<br />
URL 307<br />
reddit<br />
about 212-215<br />
references 213<br />
URL 215<br />
regularization<br />
URL 97<br />
reinforcement learning<br />
URL 304<br />
RESTful interface (Representational<br />
State Transfer) 213<br />
rules<br />
confidence 10<br />
finding 13<br />
support 10<br />
S<br />
sample size<br />
increasing 304<br />
scikit-learn<br />
installing 6<br />
URL 7<br />
scikit-learn estimators<br />
algorithm, running 32, 33<br />
dataset, loading 29-31<br />
defining 25, 26<br />
distance metrics 27, 28<br />
fit() function 31<br />
Nearest neighbors 26, 27<br />
parameters, setting 33-35<br />
predict() 25<br />
predict() function 31<br />
standard workflow, defining 31<br />
scikit-learn library<br />
about 25<br />
estimators 25<br />
pipelines 25<br />
transformers 25<br />
[ 315 ]