13.08.2022 Views

advanced-algorithmic-trading

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

224

disadvantages are the need to have a large quantity of observational training data, often far

greater than that necessary for parametric models, as well as their proneness to overfitting.

Given the fact that there is seemingly an abundance of historical financial data it would seem

that non-parametric models are an obvious choice for quantitative finance applications. However,

such models are not always optimal. While the increased flexibility is attractive for modelling

the non-linearities in stock market data it is very easy to overfit the model due to the poor signal

to noise ratio found in financial time series.

A key example of a non-parametric machine learning model is k-nearest neighbours, which

seeks to classify or predict values via the mode or mean, respectively, of a group of k nearest

neighbour points in feature space.

It is non-parametric in the sense that there are no β values as in linear regression to "fit" the

model to. However it does contain what is known as a hyperparameter, which in this case is the

number of points in feature space, k, to take the mean or mode over.

This hyperparameter is optimised similarly to fitting parameters via methods such as k-fold

cross validation, which will be discussed later in the book.

15.4.7 Statistical Framework for Machine Learning Domains

In the above section on Machine Learning Domains supervised learning and unsupervised learning

were both outlined.

A supervised learning model requires the predictor-response pair (x i , y i ). The "supervision"

or "training" of the model occurs when f is trained or fit to a particular set of data. In a statistical

setting the common approach to estimating the parameters–fitting the model–is carried out using

a technique known as Maximum Likelihood Estimation (MLE). For each model, the algorithm

for carrying out MLE can be very different. For instance in the linear regression setting the MLE

to find the coefficient estimates, ˆβ, is carried out using a procedure known as Ordinary Least

Squares.

An unsupervised learning model still utilises the feature vectors x i but does not have the

associated response value y i . This is a much more challenging environment for an algorithm to

produce effective relationship estimates as there nothing to "supervise" or "train" the model with.

Moreover, it is harder to assess model accuracy as there is no easily available "fitness function"

on which to compare. This does not detract from the efficiency of unsupervised techniques,

however. They are widely utilised in quantitative finance, particularly for factor models.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!