13.08.2022 Views

advanced-algorithmic-trading

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

287

At this stage we have the necessary data to begin creating a set of statistical machine learning

models.

Validation Set Approach

Now that we have the financial data we need to create a set of predictive regression models we

can use the above cross-validation methods to obtain estimates for the test error.

The first task is to import the models from Scikit-Learn. We will choose a linear regression

model with polynomial features. This provides us with the ability to choose varying degrees

of flexibility simply by increasing the degree of the features’ polynomial order. Initially we are

going to consider the validation set approach to cross validation.

Scikit-Learn provides a validation set approach via the train_test_split method found

in the cross_validation module. We will also need to import the KFold method for k-fold

cross validation later, as well as the linear regression model itself. We need to import the MSE

calculation as well as Pipeline and PolynomialFeatures. The latter two allow us to easily

create a set of polynomial feature linear regression models with minimal additional coding:

..

from sklearn.cross_validation import train_test_split, KFold

from sklearn.linear_model import LinearRegression

from sklearn.metrics import mean_squared_error

from sklearn.pipeline import Pipeline

from sklearn.preprocessing import PolynomialFeatures

..

Once the modules are imported we can create a Pandas DataFrame that uses ten of the prior

lagging days of Amazon returns as predictors. We can then create ten separate random splittings

of the data into a training and validation set.

Finally, for multiple degrees of the polynomial features of the linear regression, we can calculate

the test error. This provides us with ten separate test error curves, each value of which

shows the test MSE for a differing degree of polynomial kernel:

..

..

def validation_set_poly(random_seeds, degrees, X, y):

"""

Use the train_test_split method to create a

training set and a validation set (50% in each)

using "random_seeds" separate random samplings over

linear regression models of varying flexibility

"""

sample_dict = dict(

[("seed_%s" % i,[]) for i in range(1, random_seeds+1)]

)

# Loop over each random splitting into a train-test split

for i in range(1, random_seeds+1):

print("Random: %s" % i)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!