25.10.2016 Views

SAP HANA Predictive Analysis Library (PAL)

sap_hana_predictive_analysis_library_pal_en

sap_hana_predictive_analysis_library_pal_en

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

3.2.3 C4.5 Decision Tree<br />

A decision tree is used as a classifier for determining an appropriate action or decision among a<br />

predetermined set of actions for a given case. A decision tree helps you to effectively identify the factors to<br />

consider and how each factor has historically been associated with different outcomes of the decision. A<br />

decision tree uses a tree - like structure of conditions and their possible consequences. Each node of a<br />

decision tree can be a leaf node or a decision node.<br />

●<br />

●<br />

Leaf node: mentions the value of the dependent (target) variable.<br />

Decision node: contains one condition that specifies some test on an attribute value. The outcome of the<br />

condition is further divided into branches with sub-trees or leaf nodes.<br />

As a classification algorithm, C4.5 builds decision trees from a set of training data, using the concept of<br />

information entropy. The training data is a set of already classified samples. At each node of the tree, C4.5<br />

chooses one attribute of the data that most effectively splits it into subsets in one class or the other. Its<br />

criterion is the normalized information gain (difference in entropy) that results from choosing an attribute for<br />

splitting the data. The attribute with the highest normalized information gain is chosen to make the decision.<br />

The C4.5 algorithm then proceeds recursively until meeting some stopping criteria such as the minimum<br />

number of cases in a leaf node.<br />

The C4.5 decision tree functions implemented in <strong>PAL</strong> support both discrete and continuous values. In <strong>PAL</strong><br />

implementation, the REP (Reduced Error Pruning) algorithm is used as pruning method.<br />

Prerequisites<br />

●<br />

●<br />

●<br />

●<br />

The column order and column number of the predicted data are the same as the order and number used in<br />

tree model building.<br />

The last column of the training data is used as a predicted field and is of discrete type. The predicted data<br />

set has an ID column.<br />

The table used to store the tree model is a column table.<br />

The target column of training data must not have null values, and other columns should have at least one<br />

valid value (not null).<br />

Note<br />

C4.5 decision tree treats null as a special value.<br />

CREATEDTWITHC45<br />

This function creates a decision tree from the input training data.<br />

Procedure Generation<br />

CALL SYS.AFLLANG_WRAPPER_PROCEDURE_CREATE (‘AFL<strong>PAL</strong>’, ‘CREATEDTWITHC45’,<br />

‘’, '', );<br />

138 P U B L I C<br />

<strong>SAP</strong> <strong>HANA</strong> <strong>Predictive</strong> <strong>Analysis</strong> <strong>Library</strong> (<strong>PAL</strong>)<br />

<strong>PAL</strong> Functions

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!