13.08.2022 Views

advanced-algorithmic-trading

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 22

Clustering Methods

In this chapter the concept of unsupervised clustering will be considered. In quantitative finance

finding groups of similar assets, or regimes in asset price series, is extremely useful. For instance,

it can aid in the development of entry and exit rule filters.

Clustering is an unsupervised learning method that attempts to partition observational data

into separate subgroups or clusters. The desired outcome of clustering is to ensure observations

within clusters are similar to each other but different to observations in other clusters.

Clustering is a vast area of academic research and it would be difficult to provide a full

taxonomy of clustering algorithms in this chapter. Instead a common, but very useful, algorithm

will be considered called K-Means Clustering. Resources will be outlined that allow further study

of more advanced algorithms if so desired.

K-Means Clustering will be applied to daily OHLC bar data in order to identify separate

price action clusters. These clusters can then be used to determine if certain market regimes

exist, as with Hidden Markov Models.

22.1 K-Means Clustering

K-Means Clustering is a particular technique for identifying subgroups or clusters within a set

of observations. It is a hard clustering technique, which means that each observation is forced

to have a unique cluster assignment. This is in contrast to a soft, or probabilistic, clustering

technique, which assigns probabilites for cluster membership.

To use K-Means Clustering it is necessary to specify a parameter K, which is the number

of desired clusters to partition the data into. These K clusters do not overlap and have "hard"

boundaries between membership (see Figure 22.1). The task is to assign each of the N observations

into one of the K clusters, where each observation belongs to a cluster with the closest

mean feature/observation vector.

Mathematically we can define sets S k , k ∈ {1, . . . , K}, each of which contains the indices

of the subset of the N observations that lie in each cluster. These sets exhaustively cover all

indices, that is each observation belongs to at least one of the sets S k and the clusters are

mutually exclusive with "hard" boundaries:

• S = ⋃ K

k=1 S k = {1, . . . , N}

• S k ∩ S k ′ = φ, ∀k ≠ k ′ 305

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!