13.08.2022 Views

advanced-algorithmic-trading

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

49

non-informative priors.

• Posterior Distributions: I mentioned above that the frequentist MLE value for our

regression coefficients, ˆβ, was only a single point estimate. In the Bayesian formulation we

receive an entire probability distribution that characterises our uncertainty on the different

β coefficients. The immediate benefit of this is that after taking into account any data

we can quantify our uncertainty in the β parameters via the variance of this posterior

distribution. A larger variance indicates more uncertainty.

While the above formula for the Bayesian approach may appear succinct it does not provide

much insight into a specification for a model that can be sampled using MCMC. Thus in the

next few sections it will be demonstrated how PyMC3 can be used to formulate and utilise a

Bayesian linear regression model.

5.3 Bayesian Linear Regression with PyMC3

In this section we are going to carry out a time-honoured approach to statistical examples,

namely to simulate some data with properties that we know, and then fit a model to recover

these original properties. I have used this technique many times in the past on QuantStart.com

and it will feature heavily in later chapters on Time Series Analysis.

While it may seem contrived to go through such a procedure, there are in fact two major

benefits. The first is that it helps us understand exactly how to fit the model. In order to do so,

we have to understand it first. This provides intuition into how the model works. The second

reason is that it allows us to see how the model performs in a situation where we actually know

the true values trying to be estimated.

Our approach will make use of NumPy and Pandas to simulate the data and Seaborn to

plot it. The Generalised Linear Models (GLM) module of PyMC3 will be used to formulate a

Bayesian linear regression and sample from it, on the simulated data set.

5.3.1 What are Generalised Linear Models?

Before we begin discussing Bayesian linear regression, I want to briefly outline the concept of a

Generalised Linear Model (GLM) as they will be used to formulate our model in PyMC3.

A Generalised Linear Model is a flexible mechanism for extending ordinary linear regression

to more general forms of regression, including logistic regression (classification) and Poisson

regression (used for count data), as well as linear regression itself.

GLMs allow for response variables that have error distributions other than the normal distribution

(see ɛ above, in the frequentist section). The linear model is related to the response y

via a "link function" and is assumed to be generated via a statistical distribution from the exponential

distribution family. This family of probability distributions encompasses many common

distributions including the normal, gamma, beta, chi-squared, Bernoulli, Poisson and others.

The mean of this distribution, µ depends on X via the following relation:

E(y) = µ = g −1 (Xβ) (5.6)

Where g is the link function. The variance is often some function, V , of the mean:

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!