13.08.2022 Views

advanced-algorithmic-trading

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

248

18.4 Advantages and Disadvantages of Decision Trees

As with all machine learning methods there are both advantages and disadvantages to using

DT/CARTs over other models:

18.4.1 Advantages

• DT/CART models are easy to interpret since they generate automatic "if-then-else" rules

• The models can handle categorical and continuous features in the same data set

• The method of construction for DT/CART models means that feature variables are automatically

selected, rather than having to use subset selection or similar dimensionality

reduction techniques

• The models are able to scale effectively on large datasets

18.4.2 Disadvantages

• Poor relative prediction performance compared to other ML models

• DT/CART models suffer from instability, which means they are very sensitive to small

changes in the feature space. In the language of the bias-variance trade-off, they are high

variance estimators.

While DT/CART models themselves suffer from poor prediction performance they are extremely

competitive when utilised in an ensemble setting, via bootstrap aggregation ("bagging"),

Random Forests or boosting, which will now be discussed.

18.5 Ensemble Methods

In this section it will be shown how combining multiple decision trees in a statistical ensemble will

vastly improve the predictive performance on the combined model. These statistical ensemble

techniques are not limited to DTs and are in fact applicable to many regression and classification

machine learning models. However, DTs provide a natural setting to discuss ensemble methods

and they are often commonly associated together.

Once the theory of ensemble methods has been discussed they will be implemented in Python

using the Scikit-Learn library, on financial data. Later in the book it will be shown how to apply

such ensemble methods to an intraday trading strategy.

However before discussing the ensemble techniques of bagging, Random Forests and boosting,

it is necessary to outline a technique from frequentist statistics known as The Bootstrap.

18.5.1 The Bootstrap

Bootstrapping[39] is a statistical resampling technique that involves random sampling of a dataset

with replacement. It is often used as a means of quantifying the uncertainty associated with a

machine learning model.

For quantitative finance purposes bootstrapping is extremely useful as it allows generation

of new samples from a population without having to go and collect additional training data.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!