13.08.2022 Views

advanced-algorithmic-trading

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 5

Bayesian Linear Regression

At this stage in our journey of Bayesian statistics we inferred a binomial proportion analytically

with conjugate priors and have described the basics of Markov Chain Monte Carlo via the

Metropolis algorithm. In this chapter we are going to introduce linear regression modelling in

the Bayesian framework and carry out inference using the PyMC3 MCMC library.

We will begin by recapping the classical, or frequentist, approach to multiple linear regression

(this is discussed at length in the Machine Learning section in later chapters). Then we will

discuss how a Bayesian thinks of linear regression. We will briefly describe the concept of a

Generalised Linear Model (GLM), as this is necessary to understand the clean syntax of model

descriptions in PyMC3.

Subsequent to the description of these models we will simulate some linear data with noise

and then use PyMC3 to produce posterior distributions for the parameters of the model. This

is the same procedure that we will carry out when discussing time series models such as ARMA

and GARCH later on in the book. This "simulate and fit" process not only helps us understand

the model, but also checks that we are fitting it correctly when we know the "true" parameter

values.

Let us now turn our attention to the frequentist approach to linear regression. More on this

approach can be found in the later Machine Learning chapter on Linear Regression.

5.1 Frequentist Linear Regression

The frequentist (classical) approach to multiple linear regression assumes a model of the form[51]:

p∑

f (X) = β 0 + X j β j + ɛ = β T X + ɛ (5.1)

j=1

Where, β T is the transpose of the coefficient vector β and ɛ ∼ N (0, σ 2 ) is the measurement

error, normally distributed with mean zero and standard deviation σ.

That is, our model f(X) is linear in the predictors, X, with some associated measurement

error.

If we have a set of training data (x 1 , y 1 ), . . . , (x N , y N ) then the goal is to estimate the β

coefficients, which provide the best linear fit to the data. Geometrically, this means we need to

47

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!