13.08.2022 Views

advanced-algorithmic-trading

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

283

Figure 20.3: Amazon, Inc. - Price History

We will now outline the differing ways of carrying out cross-validation, starting with the

validation set approach and then finally k-fold cross validation. In each case we will use

Pandas and Scikit-Learn to implement these methods.

20.2.3 Validation Set Approach

The validation set approach to cross-validation is very simple to carry out. Essentially we take

the set of observations (n days of data) and randomly divide them into two equal halves. One

half is known as the training set while the second half is known as the validation set. The model

is fit using only the data in the training set, while its test error is estimated using only the

validation set.

This is easily recognisable as a technique often used in quantitative trading as a mechanism

for assessing predictive performance. However, it is more common to find two-thirds of the data

used for the training set, while the remaining third is used for validation. In addition it is more

common to retain the ordering of the time series such that the first two-thirds chronologically

represents the first two-thirds of the historical data.

What is less common when applying this method is randomising the observations into each

of the two separate sets. Even less common is a discussion as to the subtle problems that arise

when this is carried out.

Firstly, and especially in situations with limited data, the procedure can lead to a high

variance for the estimate of the test error due to the randomisation of the samples. This is a

typical "gotcha" when carrying out the validation set approach to cross-validation. It is all too

possible to achieve a low test error simply through blind luck on receiving an appropriate random

sample split. Hence the true test error (i.e. predictive power) can be significantly underestimated.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!