13.08.2022 Views

advanced-algorithmic-trading

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

280

However if we plot the test MSE, which is given by the blue curve, the situation is quite different.

The test MSE initially decreases as we increase the flexibility of the model but eventually

starts to increase again after we introduce more flexibility. Why is this? By increasing the model

flexibility we are letting it fit to unique "patterns" in the training data.

However as soon as we introduce new test data the model cannot generalise well because these

"patterns" are only random artifacts of the training data and are not an underlying property of

the true sinusoidal form. We are in a situation of overfitting. Colloquially, the model is fitting

itself to the noise and not the signal.

In fact this property of a "u-shaped" test MSE as a function of model flexibility is an intrinsic

property of supervised machine learning models known as the bias-variance tradeoff.

It can be shown (see below in the Mathematical Explanation section) that the expected test

MSE, where the expectation is taken across many test sets, is given by:

E(y 0 − ˆf(x 0 )) 2 = Var( ˆf(x

[

0 )) + Bias ˆf(x

2

0 )]

+ Var(ɛ) (20.7)

The first term on the right hand side is the variance of the estimate across many testing sets.

It determines how much the average model estimation deviates as different testing data are used.

In particular a model with high variance is suggestive that it is overfit to the training data.

The middle term is the squared bias, which characterises the difference between the averages

of the estimate and the true values. A model with high bias is not capturing the underlying behaviour

of the true functional form well. One can imagine the situation where a linear regression

is used to model a sine curve (as above). No matter how well "fit" to the data the model is, it

will never capture the non-linearity inherant in a sine curve.

The final term is known as the irreducible error. It is the minimum lower bound for the

test MSE. Since we only ever have access to the training data points (including the randomness

associated with the ɛ values) we cannot ever hope to get a "more accurate" fit than what the

variance of the residuals offer.

Generally as flexibility increases we see an increase in variance and a decrease in bias. However

it is the relative rate of change between these two factors that determines whether the expected

test MSE increases or decreases.

As flexibility is increased the bias will tend to drop quickly (faster than the variance can

increase) and so we see a drop in test MSE. However, as flexibility increases further, there is

less reduction in bias (because the flexibility of the model can fit the training data easily) and

instead the variance rapidly increases, due to the model being overfit.

Our ultimate goal in supervised machine learning is to try and minimise the

expected test MSE, that is we must choose a supervised machine learning model

that simultaneously has low variance and low bias.

If you wish to gain a more mathematically precise definition of the bias-variance tradeoff then

you can read the following section, otherwise it may be skipped.

A More Mathematical Explanation

We have now qualitatively outlined the issues surrounding model flexibility, bias and variance. In

this section we are going to carry out a mathematical decomposition of the expected prediction

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!