24.07.2016 Views

www.allitebooks.com

Learning%20Data%20Mining%20with%20Python

Learning%20Data%20Mining%20with%20Python

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 6<br />

Here's an excerpt from The Lord of the Rings, J.R.R. Tolkien:<br />

Three Rings for the Elven-kings under the sky,<br />

Seven for the Dwarf-lords in halls of stone,<br />

Nine for Mortal Men, doomed to die,<br />

One for the Dark Lord on his dark throne<br />

In the Land of Mordor where the Shadows lie.<br />

One Ring to rule them all, One Ring to find them,<br />

One Ring to bring them all and in the darkness bind them.<br />

In the Land of Mordor where the Shadows lie.<br />

- J.R.R. Tolkien's epigraph to The Lord of The Rings<br />

The word the appears nine times in this quote, while the words in, for, to, and one<br />

each appear four times. The word ring appears three times, as does the word of.<br />

We can create a dataset from this, choosing a subset of words and counting<br />

the frequency:<br />

Word the one ring to<br />

Frequency 9 4 3 4<br />

We can use the counter class to do a simple count for a given string. When counting<br />

words, it is normal to convert all letters to lowercase, which we do when creating the<br />

string. The code is as follows:<br />

s = """Three Rings for the Elven-kings under the sky,<br />

Seven for the Dwarf-lords in halls of stone,<br />

Nine for Mortal Men, doomed to die,<br />

One for the Dark Lord on his dark throne<br />

In the Land of Mordor where the Shadows lie.<br />

One Ring to rule them all, One Ring to find them,<br />

One Ring to bring them all and in the darkness bind them.<br />

In the Land of Mordor where the Shadows lie. """.lower()<br />

words = s.split()<br />

from collections import Counter<br />

c = Counter(words)<br />

[ 119 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!