25.10.2016 Views

SAP HANA Predictive Analysis Library (PAL)

sap_hana_predictive_analysis_library_pal_en

sap_hana_predictive_analysis_library_pal_en

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

3.1.4 Cluster Assignment<br />

Cluster assignment is used to assign data to the clusters that were previously generated by some clustering<br />

methods such as K-means, DBSCAN (Density-Based Spatial Clustering of Applications with Noise), and SOM<br />

(Self-Organizing Maps).<br />

This algorithm requires that the corresponding clustering procedures save cluster information, or cluster<br />

model, which also includes the control parameters for consistency. It assumes that new data is from similar<br />

distribution as previous data, and will not update the cluster information.<br />

For clusters generated by K-means, distances between new data and cluster centers are calculated, and then<br />

the new data is assigned to the cluster with the smallest distance.<br />

For clusters generated by DBSCAN, all core objects are stored. For each piece of new data, the algorithm tries<br />

to find a core object in some formed cluster whose distance is less than the value of the RADIUS parameter. If<br />

such a core object is found, the new data is then assigned to the corresponding cluster, otherwise it is<br />

assigned to cluster -1, indicating that it is noise. It is possible that a piece of data can belong to more than one<br />

cluster, which can be further divided into the following two cases:<br />

●<br />

●<br />

If the number of core objects whose distances to the new data is less than the MINPTS parameter value,<br />

meaning that the new data is a border object, the new data is assigned to the cluster where there is a core<br />

object having the smallest distance to the new data.<br />

If the number of core objects whose distances to the new data is not less than MINPTS, which means the<br />

new data is also a core object, it is then assigned to cluster -2, indicating that it belongs to more than one<br />

cluster. In this case, re-running the DBSCAN function is highly suggested.<br />

For clusters generated by SOM, similar to K-means, the distances between new data and weight vector are<br />

calculated, and the new data is then assigned to the cluster with the smallest distance.<br />

Prerequisites<br />

●<br />

●<br />

●<br />

No missing or null data in the inputs.<br />

Data types must be identical to those in the corresponding clustering procedure.<br />

The data types of the ID columns in the data input table and the result output table must be identical.<br />

CLUSTERASSIGNMENT<br />

This function directly assigns data to clusters based on the previous cluster model, without running clustering<br />

procedure thoroughly. It currently supports the K-means, DBSCAN, and SOM clustering methods.<br />

Procedure Generation<br />

CALL SYS.AFLLANG_WRAPPER_PROCEDURE_CREATE (‘AFL<strong>PAL</strong>’, ‘CLUSTERASSIGNMENT’,<br />

‘’, '', );<br />

38 P U B L I C<br />

<strong>SAP</strong> <strong>HANA</strong> <strong>Predictive</strong> <strong>Analysis</strong> <strong>Library</strong> (<strong>PAL</strong>)<br />

<strong>PAL</strong> Functions

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!