13.08.2022 Views

advanced-algorithmic-trading

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

230

variance σ 2 . ɛ represents the difference between the predictions made by the linear regression

and the true value of the response variable.

Note that β T , which represents the transpose of the vector β, and x are both p+1-dimensional,

rather than p dimensional, because of the need to include an intercept term. β T = (β 0 , β 1 , . . . , β p ),

while x = (1, x 1 , . . . , x p ). The single unity ’1’ is included in x as a notational "trick" to allow

linear regression to be written in matrix notation.

17.2 Probabilistic Interpretation

An alternative way to look at linear regression is to consider it as a joint probability model as

discussed in Hastie et al (2009)[51] and Murphy (2012)[71]. A joint probability model describes

the behaviour of how the joint probability of the response y is conditional on the values of the

feature vector x, along with any parameters of the model, themselves given by the vector θ. This

leads to a mathematical model of the form p(y | x, θ). This is known as a conditional probability

density (CPD) model since it involves y conditional on the features x and parameters

θ.

Linear regression can be written as a CPD in the following manner:

p(y | x, θ) = N (y | µ(x), σ 2 (x)) (17.2)

For linear regression it is assumed that µ(x) is linear and so µ(x) = β T x. It must also be

assumed that the variance in the model is fixed. In particular this means that that the variance

does not depend on x and that σ 2 (x) = σ 2 is a constant. Thus the full parameter vector consists

of both the feature coefficients β and the variance σ 2 , given by θ = (β, σ 2 ).

Recall that such a probabilistic interpretation was considered in the chapter on Bayesian

Linear Regression.

What is the rationale for generalising a simple technique such as linear regression in this

manner? The primary benefit is that it becomes more straightforward to see how other models,

especially those which handle non-linearities, fit into the same probabilistic framework. This

allows derivation of results across models using similar techniques.

If only a single-dimensional feature x is considered, that is x = (1, x), it is possible to plot

p(y | x, θ) against y and x to see this joint distribution graphically. In order to do so the

parameters β = (β 0 , β 1 ) and σ 2 must be fixed. The following code snippet comprises a Python

script that uses Matplotlib to display the distribution:

# lin_reg_distribution_plot.py

from matplotlib import cm

from matplotlib.ticker import LinearLocator, FormatStrFormatter

import matplotlib.pyplot as plt

from mpl_toolkits.mplot3d import Axes3D

import numpy as np

from scipy.stats import norm

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!