13.08.2022 Views

advanced-algorithmic-trading

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

250

Unfortunately this gain in prediction accuracy comes at a price–significantly reduced interpretability

of the data. However in quantitative trading research interpretability is often less

important than raw prediction accuracy.

Note that there are specific statistical methods to deduce the important variables in bagging

but they are beyond the scope of this book.

18.5.3 Random Forests

Random Forests[27] are very similar to the procedure of bagging except that they make use of a

technique called feature bagging. Feature bagging significantly decreases the correlation between

each DT and thus increases total predictive accuracy, on average.

Feature bagging works by randomly selecting a subset of the p feature dimensions at each split

in the growth of individual DTs. This may sound counterintuitive, after all it is often desired to

include as many features as possible initially in order to gain as much information for the model.

However it has the purpose of deliberately avoiding (on average) very strong predictive features

that lead to similar splits in trees.

That is, if a particular feature is strong in the response value, then it will be selected for

many trees and thus a standard bagging procedure can be quite correlated. Random Forests

avoid this by deliberately leaving out these strong features in many of the grown trees.

If all p values are chosen in the Random Forest setting then this simply corresponds to

bagging. A rule-of-thumb is to use √ p features, suitably rounded, at each split.

In the Python section below it will be shown how Random Forests compare to bagging in

their performance.

18.5.4 Boosting

Another general machine learning ensemble method is known as boosting. Boosting differs somewhat

from bagging as it does not involve bootstrap sampling. Instead models are generated

sequentially and iteratively. It is necessary to have information about model i before iteration

i + 1 is produced.

Boosting was motivated by Kearns and Valiant (1989)[63]. They asked whether it was possible

to combine, in some fashion, a selection of weak machine learning models to produce a single

strong machine learning model. Weak, in this instance means a model that is only slightly

better than chance at predicting a response. Correspondingly, a strong learner is one that is

well-correlated to the true response.

This motivated the concept of boosting. The idea is to iteratively learn weak machine learning

models on a continually-updated training data set and then add them together to produce a final,

strong learning model. This differs from bagging, which simply averages the models on separate

bootstrapped samples.

The basic algorithm for boosting, which is discussed at length in James et al (2013)[59] and

Hastie et al (2009)[51], is given in the following:

1. Set the initial estimator to zero, that is ˆf(x) = 0. Also set the residuals to the current

responses, r i = y i , for all elements in the training set.

2. Set the number of boosted trees, B. Loop over b = 1, . . . , B:

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!