13.08.2022 Views

advanced-algorithmic-trading

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

306

Figure 22.1: K-Means Clustering "hard" boundary locations, with feature vector centroids

marked as white crosses

The goal of K-Means Clustering is to minimise the Within-Cluster Variation (WCV), also

known as the Within-Cluster Sum of Squares (WCSS). This concept represents the sum across

clusters of the sum of distances to each point in the cluster to its mean. That is, it measures how

much observations within a cluster differ from each other. This translates into an optimisation

problem, the goal of which is to minimise the following expression:

arg min

S

K∑

k=1

x i∈S k

‖x i − µ k ‖ (22.1)

Where µ k represents the mean feature vector of cluster k and x i is the ith feature vector in

cluster k.

Unfortunately this particular minimisation is difficult to solve globally. That is, finding a

global minimum to this problem is NP-hard, in the complexity sense.

Fortunately however there exists useful heuristic algorithms for finding acceptable local optima,

one of which will be outlined below.

22.1.1 The Algorithm

The heuristic algorithm to solve K-Means Clustering is, understandably, known as the K-Means

Algorithm. It is relatively straightforward to conceptualise. It consists of two steps, the second

of which is iterated until completion:

1. Assign each observation x i to a random cluster k.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!