24.07.2016 Views

www.allitebooks.com

Learning%20Data%20Mining%20with%20Python

Learning%20Data%20Mining%20with%20Python

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Next Steps…<br />

Vowpal Wabbit<br />

http://hunch.net/~vw/<br />

Vowpal Wabbit is a great project, providing very fast feature extraction for textbased<br />

problems. It <strong>com</strong>es with a Python wrapper, allowing you to call it from with<br />

Python code. Test it out on large datasets, such as the one we used in Chapter 12,<br />

Working with Big Data.<br />

Chapter 6 – Social Media Insight Using<br />

Naive Bayes<br />

Spam detection<br />

http://scikit-learn.org/stable/modules/model_evaluation.html#scoringparameter<br />

Using the concepts in this chapter, you can create a spam detection method that is<br />

able to view a social media post and determine whether it is spam or not. Try this<br />

out by first creating a dataset of spam/not-spam posts, implementing the text mining<br />

algorithms, and then evaluating them.<br />

One important consideration with spam detection is the false-positive/false-negative<br />

ratio. Many people would prefer to have a couple of spam messages slip through,<br />

rather than miss out on a legitimate message because the filter was too aggressive in<br />

stopping the spam. In order to turn your method for this, you can use a Grid Search<br />

with the f1-score as the evaluation criteria. See the above link for information on how<br />

to do this.<br />

Natural language processing and part-ofspeech<br />

tagging<br />

http://<strong>www</strong>.nltk.org/book/ch05.html<br />

The techniques we used in this chapter were quite lightweight <strong>com</strong>pared to some of<br />

the linguistic models employed in other areas. For example, part-of-speech tagging<br />

can help disambiguate word forms, allowing for higher accuracy. The book that<br />

<strong>com</strong>es with NLTK has a chapter on this, linked above. The whole book is well worth<br />

reading too.<br />

[ 302 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!