13.08.2022 Views

advanced-algorithmic-trading

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

307

2. Iterate the following until cluster assignment remains fixed:

(a) Compute each cluster’s mean feature vector, the centroid µ k .

(b) Assign each observation x i to the closest µ k , where "closeness" is given by standard

Euclidean distance.

It will not be discussed here why this algorithm guarantees to find a local optimum. For a

more detailed discussion of the mathematical theory behind this see James et al (2009)[59] and

Hastie et al (2009)[51].

Note that because the initial cluster assignments are randomly chosen the local optimum

determined is heavily dependent upon these initial choices. In practice the algorithm is run

multiple times (the default for Scikit-Learn is ten times) and the best local optimum is chosen–

that is the one that minimises the WCSS the most.

22.1.2 Issues

The K-Means algorithm is not without its flaws. One of the biggest problems in quantitative

finance is that the signal-to-noise ratio of financial pricing data is low, which makes it hard to

extract the predictive signal used for trading strategies.

The nature of the K-Means Algorithm is such that it is forced to generate K clusters, even

if the data is highly noisy. This has the obvious implication that such "clusters" are not truly

separate distributions of data but are really artifacts of a noisy dataset. This is a tricky problem

to deal with in quantitative trading.

Another aspect of specifying a K is that certain outlying data points will automatically be

assigned to a cluster, whether they are truly part of the distribution that generated them or not.

This is due to the necessity of imposing a hard cluster boundary. In finance outlying data points

are not uncommon, not least due to errors/bad ticks but also due to flash crashes and other

rapid changes to an asset price.

This clustering method is also quite sensitive to variations in the underlying dataset. That is,

if a financial asset pricing series is randomly split into two and two separate K-Means algorithms

were fitted to each, sharing the same K parameter, it is common to see two very different cluster

assignments for observational data which is "similar". This begs the question as to how robust

such a mechanism is on small financial data sets. As always, more data can be helpful in this

instance.

Such issues motivate more sophisticated clustering algorithms, which unfortunately are beyond

the scope of this book due to both the range of the methods as well as their increased

mathematical sophistication. However, for those who are interested in delving deeper into unsupervised

clustering, the following methods can be looked at:

• Gaussian Mixture Models and the Expectation-Maximisation Algorithm

• DBSCAN and OPTICS algorithms

• Deep Neural Network Architectures: Autoencoders and Restricted Boltzmann Machines

Attention will now turn towards simulating data and fitting the K-Means algorithm to it.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!