13.08.2022 Views

advanced-algorithmic-trading

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

237

Once the full dataset is created it is necessary to partition it into a training and test set,

using Python array-slicing syntax:

# Split up the features, X, and responses, y, into

# training and test arrays

X_train = X[:split]

X_test = X[split:]

y_train = y[:split]

y_test = y[split:]

The next step is to create and fit a linear regression model to the training data. The following

code demonstrates the simplicity of the Scikit-Learn API for carrying this out. All of the above

mentioned details regarding OLS and MLE are abstracted away by the fit(...) method:

# Open a scikit-learn linear regression model

# and fit it to the training data

lr_model = linear_model.LinearRegression()

lr_model.fit(X_train, y_train)

The lr_model linear regression model instance can be queried for the intercept and slope

parameters. The coef_ member is actually an array of slope parameters, since it generalises

to the multivariate case where there are multiple slopes, one for each feature dimension. Hence

the first element is selected for this univariate case. The intercept can be obtained from the

intercept_ member:

# Output the estimated parameters for the linear model

print(

"Estimated intercept, slope: %0.6f, %0.6f" % (

lr_model.intercept_,

lr_model.coef_[0]

)

)

Sample output from executing this code is given by the following. Note that the values will

lilkely be different on other machines due to the stochastic nature of the number generation and

fitting procedure:

Estimated intercept, slope: 2.006315, 2.908600

Now that the model has been fitted to the training data it can be utilised to predict the

intercept and slope on the testing data. A scatterplot of the testing data is visualised and overlaid

with the (point) estimated line of best fit, given in Figure 17.2. Compare this to the Bayesian

linear regression example given in the previous chapter that provides a posterior distribution of

such best fit lines:

# Create a scatterplot of the test data for features

# against responses, plotting the estimated line

# of best fit from the ordinary least squares procedure

plt.scatter(X_test, y_test)

plt.plot(

X_test,

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!