13.08.2022 Views

advanced-algorithmic-trading

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

421

29.2 Building a Prediction Model on Historical Data

In this section a prediction model will be created using Scikit-Learn. As this is a supervised

machine learning task the model is trained on a subset of the historical data. In particular data

for the US equity AREX is used from late 2007 until the end of 2012, although the method will

work with any intraday bar data (including equities and forex).

In order to train the model it is necessary to specify the feature predictors and the response.

The features in this instance are lagged minutely returns. Feature p is the pth bar closing price

return behind the current bar’s closing price return. The response is either the "up/down factor"

as described above (+1 or -1 for True/False) or the directional change of the returns in the next

bar (+1 or -1 for up/down).

The code below generates both the "up/down factor" and the directional change, but in the

trading strategy below the directional change is used as the response. It is straightforward to

uncomment a line in the following snippets in order to change this to the "up/down factor" but

this comes at the price of increased class imbalance. There are fewer instances of the stock going

up by a certain factor and not down by another, compared to simple directional prediction.

Once the model is trained it is then serialised. This means that the Python object containing

the model itself, along with its associated set of trained parameters, is written to disk in a special

format. This allows the model to be deserialised in the QSTrader environment. This is known

as pickling within the Python ecosystem.

Pickling is a common mechanism for deploying machine learning models into production.

Scikit-Learn calls this model persistence. It provides a module called joblib for this purpose.

The first task in the code is to import the necessary libraries, including NumPy, Pandas and

Scikit-Learn. For this example the LinearDiscriminantAnalysis and RandomForestClassifier

are trained, but some other ensemble methods have also been imported and commented out.

These can be uncommented and trained if other predictive models are desired. Finally joblib

is imported for persistence and the confusion_matrix is imported to analyse class imbalance:

# intraday_ml_model_fit.py

import datetime

import numpy as np

import pandas as pd

import sklearn

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

from sklearn.ensemble import (

BaggingClassifier, RandomForestClassifier, GradientBoostingClassifier

)

from sklearn.externals import joblib

from sklearn.metrics import confusion_matrix

from sklearn.model_selection import train_test_split

from sklearn.tree import DecisionTreeClassifier

The following function create_up_down_dataframe is a little complex. It is designed to

create the necessary feature matrix X and the response vector y, although it concatenates them

into the same Pandas DataFrame initially.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!