01.11.2017 Views

BABOK_Guide_v3_member_copy

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Data Mining<br />

Techniques<br />

Data mining can be utilized in either supervised or unsupervised investigations. In<br />

a supervised investigation, users can pose a question and expect an answer that<br />

can drive their decision making. An unsupervised investigation is a pure pattern<br />

discovery exercise where patterns are allowed to emerge, and then considered for<br />

applicability to business decisions.<br />

Data mining is a general term that covers descriptive, diagnostic, and predictive<br />

techniques:<br />

• Descriptive: such as clustering make it easier to see the patterns in a set of<br />

data, such as similarities between customers.<br />

• Diagnostic: such as decision trees or segmentation can show why a<br />

pattern exists, such as the characteristics of an organization's most<br />

profitable customers.<br />

Complimentary IIBA® Member Copy. Not for Distribution or Resale.<br />

10.14.3 Elements<br />

• Predictive: such as regression or neural networks can show how likely<br />

something is to be true in the future, such as predicting the probability that<br />

a particular claim is fraudulent.<br />

In all cases it is important to consider the goal of the data mining exercise and to<br />

be prepared for considerable effort in securing the right type, volume, and quality<br />

of data with which to work.<br />

.1 Requirements Elicitation<br />

The goal and scope of data mining is established either in terms of decision<br />

requirements for an important identified business decision, or in terms of a<br />

functional area where relevant data will be mined for domain-specific pattern<br />

discovery. This top-down versus a bottom-up mining strategy allows analysts to<br />

pick the correct set of data mining techniques.<br />

Formal decision modelling techniques (see Decision Modelling (p. 265)) are used<br />

to define requirements for top-down data mining exercises. For bottom-up<br />

pattern discovery exercises it is useful if the discovered insight can be placed on<br />

existing decision models, allowing rapid use and deployment of the insight.<br />

Data mining exercises are productive when managed as an agile environment.<br />

They assist rapid iteration, confirmation, and deployment while providing project<br />

controls.<br />

.2 Data Preparation: Analytical Dataset<br />

Data mining tools work on an analytical dataset. This is generally formed by<br />

merging records from multiple tables or sources into a single, wide dataset.<br />

Repeating groups are typically collapsed into multiple sets of fields. The data may<br />

be physically extracted into an actual file or it may be a virtual file that is left in the<br />

database or data warehouse so it can be analyzed. Analytical datasets are split<br />

into a set to be used for analysis, a completely independent set for confirming<br />

that the model developed works on data not used to develop it, and a validation<br />

set for final confirmation. Data volumes can be very large, sometimes resulting in<br />

254

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!