24.07.2016 Views

www.allitebooks.com

Learning%20Data%20Mining%20with%20Python

Learning%20Data%20Mining%20with%20Python

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 8<br />

Next we set the font of the CAPTCHA we will use. You will need a font file and the<br />

filename in the following code (Coval.otf) should point to it (I just placed the file in<br />

the Notebook's directory.<br />

font = ImageFont.truetype(r"Coval.otf", 22)<br />

draw.text((2, 2), text, fill=1, font=font)<br />

You can get the Coval font I used from the Open Font Library at<br />

http://openfontlibrary.org/en/font/bretan.<br />

We convert the PIL image to a NumPy array, which allows us to use scikit-image<br />

to perform a shear on it. The scikit-image library tends to use NumPy arrays for<br />

most of its <strong>com</strong>putation. The code is as follows:<br />

image = np.array(im)<br />

We then apply the shear transform and return the image:<br />

affine_tf = tf.AffineTransform(shear=shear)<br />

image = tf.warp(image, affine_tf)<br />

return image / image.max()<br />

In the last line, we normalize by dividing by the maximum value, ensuring our<br />

feature values are in the range 0 to 1. This normalization can happen in the data<br />

preprocessing stage, the classification stage, or somewhere else.<br />

From here, we can now generate images quite easily and use pyplot to display<br />

them. First, we use our inline display for the matplotlib graphs and import pyplot.<br />

The code is as follows:<br />

%matplotlib inline<br />

from matplotlib import pyplot as plt<br />

Then we create our first CAPTCHA and show it:<br />

image = create_captcha("GENE", shear=0.5)<br />

plt.imshow(image, cmap='Greys')<br />

The result is the image shown at the start of this section: our CAPTCHA.<br />

Splitting the image into individual letters<br />

Our CAPTCHAs are words. Instead of building a classifier that can identify the<br />

thousands and thousands of possible words, we will break the problem down into<br />

a smaller problem: predicting letters.<br />

[ 167 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!