13.08.2022 Views

advanced-algorithmic-trading

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

281

error for a particular model estimate, ˆf(x) with prediction vector x = x 0 using the squared-error

loss:

The definition of the squared error loss, at the prediction point x 0 , is given by:

[ (

Err(x 0 ) = E y 0 − ˆf(x

) ]

2

0 ) | x = x0

(20.8)

However we can expand the expectation on the right hand side into three terms:

[

Err(x 0 ) = σɛ 2 + E ˆf(x

2 [

0 ) − f(x 0 )]

+ E ˆf(x0 ) − E ˆf(x

2

0 )]

(20.9)

The first term on the RHS is known as the irreducible error. It is the lower bound on the

possible expected prediction error.

The middle term is the squared bias and represents the difference in the average value of

all predictions at x 0 , across all possible testing sets, and the true mean value of the underlying

function at x 0 .

This can be thought of as the error introduced by the model in not representing the underlying

behaviour of the true function. For example, using a linear model when the phenomena is

inherently non-linear.

The third term is known as the variance. It characterises the error that is introduced as the

model becomes more flexible, and thus more sensitive to variation across differing training sets,

τ.

Err(x 0 ) = σ 2 ɛ + Bias 2 + Var( ˆf(x 0 )) (20.10)

= Irreducible Error + Bias 2 + Variance (20.11)

It is important to remember that σɛ 2 represents an absolute lower bound on the expected

prediction error. While the expected training error can be reduced monotonically to zero (just by

increasing model flexibility), the expected prediction error will always be at least the irreducible

error, even if the squared bias and variance are both zero.

20.2 Cross-Validation

In this section we will attempt to find a partial remedy to the problem of an overfit machine

learning model using a technique known as cross-validation.

Firstly, we will define cross-validation and then describe how it works. Secondly, we will

construct a forecasting model using an equity index and then apply two cross-validation methods

to this example: the validation set approach and k-fold cross-validation. Finally we will

discuss the code for the simulations using Python with Pandas, Matplotlib and Scikit-Learn.

Our goal is to eventually create a set of statistical tools that can be used within a backtesting

framework to help us minimise the problem of overfitting a model and thus constrain future

losses due to a poorly performing strategy based on such a model.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!