24.07.2016 Views

www.allitebooks.com

Learning%20Data%20Mining%20with%20Python

Learning%20Data%20Mining%20with%20Python

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Beating CAPTCHAs with Neural Networks<br />

Our CAPTCHA-busting algorithm will make the following assumptions. First, the<br />

word will be a whole and valid four-character English word (in fact, we use the same<br />

dictionary for creating and busting CAPTCHAs). Second, the word will only contain<br />

uppercase letters. No symbols, numbers, or spaces will be used. We are going to<br />

make the problem slightly harder: we are going to perform a shear transform to the<br />

text, along with varying rates of shearing.<br />

Drawing basic CAPTCHAs<br />

Next, we develop our function for creating our CAPTCHA. Our goal here is to<br />

draw an image with a word on it, along with a shear transform. We are going to<br />

use the PIL library to draw our CAPTCHAs and the scikit-image library to<br />

perform the shear transform. The scikit-image library can read images in a<br />

NumPy array format that PIL can export to, allowing us to use both libraries.<br />

Both PIL and scikit-image can be installed via pip:<br />

pip install PIL<br />

pip install scikit-image<br />

First, we import the necessary libraries and modules. We import NumPy and the<br />

Image drawing functions as follows:<br />

import numpy as np<br />

from PIL import Image, ImageDraw, ImageFont<br />

from skimage import transform as tf<br />

Then we create our base function for generating CAPTCHAs. This function takes a<br />

word and a shear value (which is normally between 0 and 0.5) to return an image in<br />

a NumPy array format. We allow the user to set the size of the resulting image, as we<br />

will use this function for single-letter training samples as well. The code is as follows:<br />

def create_captcha(text, shear=0, size=(100, 24)):<br />

We create a new image using L for the format, which means black-and-white pixels<br />

only, and create an instance of the ImageDraw class. This allows us to draw on this<br />

image using PIL. The code is as follows:<br />

im = Image.new("L", size, "black")<br />

draw = ImageDraw.Draw(im)<br />

[ 166 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!