13.08.2022 Views

advanced-algorithmic-trading

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

303

parameters of the model, θ.

Conversely, in the unsupervised case there is no access to the responses y i . Hence interest

lies in probabilistic models of the form p(x i | θ). That is, the distribution of the feature vectors

x i conditional on the parameters of the model, θ. This is known as unconditional density

estimation.

21.3 Unsupervised Learning Algorithms

There are two main areas of unsupervised learning that are of interest to us in quantitative

finance: Dimensionality Reduction and Clustering.

21.3.1 Dimensionality Reduction

We have motivated the need for dimensionality reduction above. The most common mechanism

in unsupervised learning for achieving this is (linear) Principal Components Analysis (PCA).

In machine learning and quantitative finance problems we often have a large set of correlated

variables in a high dimensional space. PCA allows us to summarise these datasets using

a reduced number of dimensions. It achieves this by carrying out an orthogonal coordinate

transformation of the original space, forming a new set of linearly uncorrelated variables called

principal components.

The principal components are found as the eigenvectors of the covariance matrix of the

data. Each principal component is orthogonal to each other (by construction) and explains

successively less of the variability of the dataset. Usually the first few principal components are

able to account for a large fraction of the variability of the original set, leading to a much lower

dimensional representation in this new space.

Another way to think of PCA is that it is a change of basis. The transformation produces

a set of basis vectors, a subset of which are capable of spanning a linear subspace within the

original space that closely follows the data grouping.

However, not all data is easily summarised by a linear subspace. In classification problems,

for instance, there are many data sources which are not linearly separable. In this case it is

possible to invoke the "kernel trick", as was discussed in the previous chapter on Support Vector

Machines, to linearly separate a space in a much higher dimensional space and thus carry out

PCA in the transformed space. This allows PCA to be applied to non-linear datasets.

In quantitative finance PCA is often used for factor analysis. An example would be looking

at a large number of correlated stocks and attempting to reduce their dimensionality by looking

at a smaller set of unobserved and uncorrelated latent factors.

21.3.2 Clustering

Another important unsupervised learning technique is known as cluster analysis. Its goal is

to assign a cluster label to elements of a feature space in order to partition them into groupings

or clusters. In certain cases this can be accomplished unambiguously if subgroupings within the

feature space are clearly distinct and easily separable. In other cases clusters may "overlap",

making it challenging to form a distinction boundary.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!