24.07.2016 Views

www.allitebooks.com

Learning%20Data%20Mining%20with%20Python

Learning%20Data%20Mining%20with%20Python

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Working with Big Data<br />

The amount of data is increasing at exponential rates. Today's systems are generating<br />

and recording information on customer behavior, distributed systems, network<br />

analysis, sensors and many, many more sources. While the current big trend of<br />

mobile data is pushing the current growth, the next big thing—the Internet of<br />

Things (IoT)—is going to further increase the rate of growth.<br />

What this means for data mining is a new way of thinking. The <strong>com</strong>plex algorithms<br />

with high run times need to be improved or discarded, while simpler algorithms<br />

that can deal with more samples are be<strong>com</strong>ing more popular to use. As an example,<br />

while support vector machines are great classifiers, some variants are difficult to use<br />

on very large datasets. In contrast, simpler algorithms such as logistic regression can<br />

manage more easily in these scenarios.<br />

In this chapter, we will investigate the following:<br />

• Big data challenges and applications<br />

• The MapReduce paradigm<br />

• Hadoop MapReduce<br />

• mrjob, a python library to run MapReduce programs on Amazon's<br />

infrastructure<br />

[ 271 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!