25.10.2016 Views

SAP HANA Predictive Analysis Library (PAL)

sap_hana_predictive_analysis_library_pal_en

sap_hana_predictive_analysis_library_pal_en

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>PAL</strong>_IQR_RESULTS_TBL:<br />

3.6.5 Partition<br />

The algorithm partitions an input dataset randomly into three disjoints subsets called training, testing, and<br />

validation set. The proportion of each subset is defined as a parameter. Let us remark that the union of these<br />

three subsets might not be the complete initial dataset.<br />

Two different partitions can be obtained:<br />

1. Random Partition, which randomly divides all the data.<br />

2. Stratified Partition, which divides each subpopulation randomly.<br />

In the second case, the dataset needs to have at least one categorical attribute (for example, of type varchar).<br />

The initial dataset will first be subdivided according to the different categorical values of this attribute. Each<br />

mutually exclusive subset will then be randomly split to obtain the training, testing, and validation subsets.<br />

This ensures that all "categorical values" or "strata" will be present in the sampled subset.<br />

Prerequisites<br />

●<br />

●<br />

There is no missing or null data in the column used for stratification.<br />

The column used for stratification must be categorical (integer, varchar, or nvarchar).<br />

448 P U B L I C<br />

<strong>SAP</strong> <strong>HANA</strong> <strong>Predictive</strong> <strong>Analysis</strong> <strong>Library</strong> (<strong>PAL</strong>)<br />

<strong>PAL</strong> Functions

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!