25.10.2016 Views

SAP HANA Predictive Analysis Library (PAL)

sap_hana_predictive_analysis_library_pal_en

sap_hana_predictive_analysis_library_pal_en

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Thus, the Euclidean distance between T1 and T2 is:<br />

Where γ is the weight to be given to the transposed categorical attributes to lessen the impact on the<br />

clustering from the 0/1 attributes. Then you can use the traditional method to update the mean of every<br />

cluster. Assuming one cluster only has T1 and T2, the mean is:<br />

Table 36:<br />

Customer ID Age Income Gender_1 Gender_2<br />

Center1 29.0 9000.0 0.5 0.5<br />

The means of categorical attributes will not be outputted. Instead, the means will be replaced by the modes<br />

similar to the K-Modes algorithm. Take the below center for example:<br />

Table 37:<br />

Age Income Gender_1 Gender_2<br />

Center 29.0 9000.0 0.25 0.75<br />

Because "Gender_2" is the maximum value, the output will be:<br />

Table 38:<br />

Age Income Gender<br />

Center 29.0 9000.0 Female<br />

Prerequisites<br />

●<br />

●<br />

The input data contains an ID column and the other columns are of integer or double data type.<br />

The input data does not contain null value. The algorithm will issue errors when encountering null values.<br />

KMEANS<br />

This is a clustering function using the k-means algorithm.<br />

Procedure Generation<br />

CALL SYS.AFLLANG_WRAPPER_PROCEDURE_CREATE (‘AFL<strong>PAL</strong>’, ‘KMEANS’, ‘’,<br />

'', );<br />

64 P U B L I C<br />

<strong>SAP</strong> <strong>HANA</strong> <strong>Predictive</strong> <strong>Analysis</strong> <strong>Library</strong> (<strong>PAL</strong>)<br />

<strong>PAL</strong> Functions

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!