www.allitebooks.com

Learning%20Data%20Mining%20with%20Python Learning%20Data%20Mining%20with%20Python

24.07.2016 Views

preprocessing, using pipelines about 35, 36 example 36, 37 features 35 features, of animal 35 standard preprocessing 37 workflow, creating 38 pricing alerts URL 295 Principal Component Analysis (PCA) 96, 97 prior belief 122 probabilistic graphical models URL 308 probabilities computing 290 programmers, for Python language URL 4 Project Gutenberg URL 189 Pydoop about 307 URL 307 Pylearn2 about 306 URL 306 Python defining 106 installing 3, 4 URL 3 using 3 Python 3.4 3 Q quotequail package 205 R RandomForestClassifier 56 random forests about 26 applying 56, 57 defining 54, 55 ensembles, working 55 new features, engineering 58 parameters 56 reasons, feature selection complexity, reducing 88 noise, reducing 88 readable models, creating 88 recall 129 recommendation engine building 307 URL 307 reddit about 212-215 references 213 URL 215 regularization URL 97 reinforcement learning URL 304 RESTful interface (Representational State Transfer) 213 rules confidence 10 finding 13 support 10 S sample size increasing 304 scikit-learn installing 6 URL 7 scikit-learn estimators algorithm, running 32, 33 dataset, loading 29-31 defining 25, 26 distance metrics 27, 28 fit() function 31 Nearest neighbors 26, 27 parameters, setting 33-35 predict() 25 predict() function 31 standard workflow, defining 31 scikit-learn library about 25 estimators 25 pipelines 25 transformers 25 [ 315 ]

scikit-learn package references 305 sepal length 16 sepal width 16 shapes adding, CAPTCHAs URL 303 Silhouette Coefficient about 155 computing 155 parameters 158 similarity graph creating 147-151 SNAP URL 303 softmax nonlinearity 252 Spam detection references 302 spam filter 129 sparse matrix 27 sparse matrix format 66 sports outcome prediction about 49 features 50 stacking 53 standings loading 50-54 standings data obtaining 50 URL 50 Stratified K Fold 32 style sheets 218 stylometry 186 subgraphs connected components 151-153 criteria, optimizing 155-159 finding 151 subreddits 212, 215 SVMs about 196 classifying with 197 kernels 198 URL 197 system building, for taking image as input 242-244 T temporal analysis 305 text about 106 extracting, from arbitrary websites 218 text-based datasets 105 text transformers bag-of-words model 118-120 defining 118 features 121 n-grams 120 word, counting in dataset 118-120 tf-idf 120 Theano about 248, 249 URL 260 using 249 Torch URL 306 train_feature_value() function 20 transformer creating 98 implementing 99, 100 transformer API 99 unit testing 101, 102 tutorial, Google URL 307 tutorial, Yahoo URL 307 tweets about 106 F1-score, used for evaluation 129, 130 features, obtaining from models 130, 132 loading 128, 129 Twitter follower information, obtaining from 140, 141 Twitter account URL 107 twitter documentation URL 107 [ 316 ]

preprocessing, using pipelines<br />

about 35, 36<br />

example 36, 37<br />

features 35<br />

features, of animal 35<br />

standard preprocessing 37<br />

workflow, creating 38<br />

pricing alerts<br />

URL 295<br />

Principal Component Analysis (PCA) 96, 97<br />

prior belief 122<br />

probabilistic graphical models<br />

URL 308<br />

probabilities<br />

<strong>com</strong>puting 290<br />

programmers, for Python language<br />

URL 4<br />

Project Gutenberg<br />

URL 189<br />

Pydoop<br />

about 307<br />

URL 307<br />

Pylearn2<br />

about 306<br />

URL 306<br />

Python<br />

defining 106<br />

installing 3, 4<br />

URL 3<br />

using 3<br />

Python 3.4 3<br />

Q<br />

quotequail package 205<br />

R<br />

RandomForestClassifier 56<br />

random forests<br />

about 26<br />

applying 56, 57<br />

defining 54, 55<br />

ensembles, working 55<br />

new features, engineering 58<br />

parameters 56<br />

reasons, feature selection<br />

<strong>com</strong>plexity, reducing 88<br />

noise, reducing 88<br />

readable models, creating 88<br />

recall 129<br />

re<strong>com</strong>mendation engine<br />

building 307<br />

URL 307<br />

reddit<br />

about 212-215<br />

references 213<br />

URL 215<br />

regularization<br />

URL 97<br />

reinforcement learning<br />

URL 304<br />

RESTful interface (Representational<br />

State Transfer) 213<br />

rules<br />

confidence 10<br />

finding 13<br />

support 10<br />

S<br />

sample size<br />

increasing 304<br />

scikit-learn<br />

installing 6<br />

URL 7<br />

scikit-learn estimators<br />

algorithm, running 32, 33<br />

dataset, loading 29-31<br />

defining 25, 26<br />

distance metrics 27, 28<br />

fit() function 31<br />

Nearest neighbors 26, 27<br />

parameters, setting 33-35<br />

predict() 25<br />

predict() function 31<br />

standard workflow, defining 31<br />

scikit-learn library<br />

about 25<br />

estimators 25<br />

pipelines 25<br />

transformers 25<br />

[ 315 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!