24.07.2016 Views

www.allitebooks.com

Learning%20Data%20Mining%20with%20Python

Learning%20Data%20Mining%20with%20Python

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 8<br />

The <strong>com</strong>bination of an appropriately sized network and well-trained weights<br />

determines how accurate the neural network can be when making classifications. The<br />

word "appropriately" also doesn't necessarily mean bigger, as neural networks that are<br />

too large can take a long time to train and can more easily overfit the training data.<br />

Weights are normally set randomly to start with, but are then<br />

updated during the training phase.<br />

We now have a classifier that has initial parameters to set (the size of the network)<br />

and parameters to train from the dataset. The classifier can then be used to predict the<br />

target of a data sample based on the inputs, much like the classification algorithms we<br />

have used in previous chapters. But first, we need a dataset to train and test with.<br />

Creating the dataset<br />

In this chapter, we will take on the role of the bad guy. We want to create a program<br />

that can beat CAPTCHAs, allowing our <strong>com</strong>ment spam program to advertise on<br />

someone's website. It should be noted that our CAPTCHAs will be a little easier<br />

that those used on the web today and that spamming isn't a very nice thing to do.<br />

Our CAPTCHAs will be individual English words of four letters only, as shown in<br />

the following image:<br />

Our goal will be to create a program that can recover the word from images like this.<br />

To do this, we will use four steps:<br />

1. Break the image into individual letters.<br />

2. Classify each individual letter.<br />

3. Re<strong>com</strong>bine the letters to form a word.<br />

4. Rank words with a dictionary to try to fix errors.<br />

[ 165 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!