13.08.2022 Views

advanced-algorithmic-trading

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

278

the following day, we are only concerned with how it can predict the following days stock prices

going forward. This quantification of a models performance is known as its generalisation

performance.

Mathematically if we have a new prediction value x 0 and a true response y 0 , then we wish to

take the expectation across all such new values to come up with the test MSE:

[

Test MSE := E (y 0 − ˆf(x 0 )) 2] (20.6)

Where the expectation is taken across all new unseen predictor-response pairs (x 0 , y 0 ).

Our goal is to select the model where the test MSE is lowest across choices of other models.

Unfortunately it is difficult to calculate the test MSE. This is because we are often in a

situation where we do not have any test data available.

In general machine learning domains this can be quite common. In quantitative trading we

are (usually) in a "data rich" environment and thus we can retain some of our data for training

and some for testing. In the next section we will discuss cross-validation, which is one means of

utilising subsets of the training data in order to estimate the test MSE.

A pertinent question to ask at this stage is "Why can we not simply use the model with

the lowest training MSE?". The simple answer is that there is no guarantee that the model

with the lowest training MSE will also be the model with the lowest test MSE. Why is this so?

The answer lies in a particular property of supervised machine learning methods known as the

bias-variance tradeoff.

20.1.3 The Bias-Variance Tradeoff

Consider a slightly contrived example situation where we know the underlying "true" relationship

between y and x = x, which is given by a sinusoidal function, f = sin, such that y = f(x) =

sin(x). Note that in reality we will not ever know the underlying f, which is why we are estimating

it in the first place!

For this situation I have created a set of training points, τ, given by y i = sin(x i ) + ɛ i , where

ɛ i are draws from a standard normal distribution (mean of zero, standard deviation equal to

one). This can be seen in Figure 20.1. The black curve is the "true" function f, restricted to

the interval [0, 2π], while the circled points represent the y i simulated data values.

We can now try to fit a few different models to this training data. The first model, given by

the green line, is a linear regression fitted with ordinary least squares estimation. The second

model, given by the blue line, is a polynomial model with degree m = 3. The third model, given

by the red curve is a higher degree polynomial with degree m = 20. Between each of the models

I have varied the flexibility, that is, the degrees of freedom (DoF). The linear model is the least

flexible with only two DoF. The most flexible model is the polynomial of order m = 20. It can be

seen that the polynomial of order m = 3 is the apparent closest fit to the underlying sinusoidal

relationship.

For each of these models we can calculate the training MSE. It can be seen in Figure 20.2

that the training MSE (given by the green curve) decreases monotonically as the flexibility of

the model increases. This makes sense since polynomials of increasing degree are as flexible as

we need them to be in order to minimise the difference between their values and those of the

sinusoidal data.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!