24.07.2016 Views

www.allitebooks.com

Learning%20Data%20Mining%20with%20Python

Learning%20Data%20Mining%20with%20Python

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Authorship studies alone cannot prove authorship, but can provide evidence for<br />

or against a given theory. For example, we can analyze Shakespeare's plays to<br />

determine his writing style, before testing whether a given sonnet actually does<br />

originate from him.<br />

A more modern use case is that of linking social network accounts. For example,<br />

a malicious online user could set up accounts on multiple online social networks.<br />

Being able to link them allows authorities to track down the user of a given<br />

account—for example, if it is harassing other online users.<br />

Chapter 9<br />

Another example used in the past is to be a backbone to provide expert testimony<br />

in court to determine whether a given person wrote a document. For instance, the<br />

suspect could be accused of writing an e-mail harassing another person. The use of<br />

authorship analysis could determine whether it is likely that person did in fact write<br />

the document. Another court-based use is to settle claims of stolen authorship. For<br />

example, two authors may claim to have written a book, and authorship analysis<br />

could provide evidence on which is the likely author.<br />

Authorship analysis is not foolproof though. A recent study found that attributing<br />

documents to authors can be made considerably harder by simply asking people,<br />

who are otherwise untrained, to hide their writing style. This study also looked at<br />

a framing exercise where people were asked to write in the style of another person.<br />

This framing of another person proved quite reliable, with the faked document<br />

<strong>com</strong>monly attributed to the person being framed.<br />

Despite these issues, authorship analysis is proving useful in a growing number<br />

of areas and is an interesting data mining problem to investigate.<br />

Attributing authorship<br />

Authorship attribution is a classification task by which we have a set of candidate<br />

authors, a set of documents from each of those authors (the training set), and a set<br />

of documents of unknown authorship (the test set). If the documents of unknown<br />

authorship definitely belong to one of the candidates, we call this a closed problem.<br />

[ 187 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!