13.08.2022 Views

advanced-algorithmic-trading

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

18

• P (θ|D) is the posterior. This is the (refined) strength of our belief of θ once the evidence

D has been taken into account. After seeing 4 heads out of 8 flips, say, this is our updated

view on the fairness of the coin.

• P (D|θ) is the likelihood. This is the probability of seeing the data D as generated by a

model with parameter θ. If we knew the coin was fair, this tells us the probability of seeing

a number of heads in a particular number of flips.

• P (D) is the evidence. This is the probability of the data as determined by summing (or

integrating) across all possible values of θ, weighted by how strongly we believe in those

particular values of θ. If we had multiple views of what the fairness of the coin is (but

didn’t know for sure), then this tells us the probability of seeing a certain sequence of flips

for all possibilities of our belief in the coin’s fairness.

The entire goal of Bayesian inference is to provide us with a rational and mathematically

sound procedure for incorporating our prior beliefs, with any evidence at hand, in order to

produce an updated posterior belief. What makes it such a valuable technique is that posterior

beliefs can themselves be used as prior beliefs under the generation of new data. Hence Bayesian

inference allows us to continually adjust our beliefs under new data by repeatedly applying Bayes’

rule.

There was a lot of theory to take in within the previous two sections, so I’m now going to

provide a concrete example using the age-old tool of statisticians: the coin-flip.

2.3 Coin-Flipping Example

In this example we are going to consider multiple coin-flips of a coin with unknown fairness. We

will use Bayesian inference to update our beliefs on the fairness of the coin as more data (i.e.

more coin flips) becomes available. The coin will actually be fair, but we won’t learn this until

the trials are carried out. At the start we have no prior belief on the fairness of the coin, that

is, we can say that any level of fairness is equally likely.

In statistical language we are going to perform N repeated Bernoulli trials with θ = 0.5. We

will use a uniform probability distribution as a means of characterising our prior belief that we

are unsure about the fairness. This states that we consider each level of fairness (or each value

of θ) to be equally likely.

We are going to use a Bayesian updating procedure to go from our prior beliefs to posterior

beliefs as we observe new coin flips. This is carried out using a particularly mathematically

succinct procedure known as conjugate priors. We won’t go into any detail on conjugate priors

within this chapter, as it will form the basis of the next chapter on Bayesian inference. It will

however provide us with the means of explaining how the coin flip example is carried out in

practice.

The uniform distribution is actually a more specific case of another probability distribution,

known as a Beta distribution. Conveniently, under the binomial model, if we use a Beta distribution

for our prior beliefs it leads to a Beta distribution for our posterior beliefs. This is an

extremely useful mathematical result, as Beta distributions are quite flexible in modelling beliefs.

However, I don’t want to dwell on the details of this too much here, since we will discuss it in

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!