24.07.2016 Views

www.allitebooks.com

Learning%20Data%20Mining%20with%20Python

Learning%20Data%20Mining%20with%20Python

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Authorship Attribution<br />

Support vector machines<br />

Support vector machines (SVMs) are classification algorithms based on a simple<br />

and intuitive idea. It performs classification between only two classes (although we<br />

can extend it to more classes). Suppose that our two classes can be separated by a<br />

line such that any points above the line belong to one class and any below the line<br />

belong to the other class. SVMs find this line and use it for prediction, much the same<br />

way as linear regression works. SVMs, however, find the best line for separating the<br />

dataset. In the following figure, we have three lines that separate the dataset: blue,<br />

black, and green. Which would you say is the best option?<br />

Intuitively, a person would normally choose the blue line as the best option,<br />

as this separates the data the most. That is, it has the maximum distance from<br />

any point in each class.<br />

Finding this line is an optimization problem, based on finding the lines of margin<br />

with the maximum distance between them.<br />

[ 196 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!