13.08.2022 Views

advanced-algorithmic-trading

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

233

17.3 Maximum Likelihood Estimation

With the model specification in hand it is now appropriate to discuss how the optimal linear

regression coefficients β are chosen to best fit the data. In the univariate case this is often known

colloquially as "finding the line of best fit". However in the multivariate case considered here the

feature vector is p + 1-dimensional, that is x ∈ R p+1 . Hence the task is generalised to finding a

p-dimensional hyperplane of best bit.

The main mechanism for finding parameters of statistical models is known as maximum

likelihood estimation (MLE). While the MLE for linear regression will be derived here for

completeness in this simple case, it is not generally relevant to quant trading models as most

software libraries will abstract away the process.

17.3.1 Likelihood and Negative Log Likelihood

MLE is an optimisation process carried out for a specific model on a particular batch of data. It

is a directed algorithmic search through a multidimensional space of possible parameter choices

that attempts to answer the following question: If the data were to have been generated by the

model, what parameters were most likely to have been used? That is, what is the probability of

seeing the data D, given a specific set of parameters θ?

Once again this reduces to a conditional probability density problem. The value sought is

the θ that maximises p(D | θ). This CPD is known as the likelihood and was briefly discussed

in the introductory chapter on Bayesian statistics.

This problem can be formulated as searching for the mode of p(D | θ), which is denoted by

ˆθ. For reasons of computational ease an equivalent task of maximising the natural logarithm of

the CPD, rather than the CPD itself, is often carried out:

ˆθ = argmax θ log p(D | θ) (17.5)

In linear regression problems an assumption is made that the feature vectors are all independent

and identically distributed (iid). This simplifies the solution of the log-likelihood

problem by making use of properties of natural logarithms. The natural log properties allow

conversion from a product of probabilities to a sum of probabilities, which vastly simplifies subsequent

differentiation necessary for the optimisation algorithm:

L(θ) := log p(D | θ) (17.6)

( N

)

= log p(y i | x i , θ)

(17.7)

=

i=1

N∑

log p(y i | x i , θ) (17.8)

i=1

An additional computational reason makes it more straightforward to minimise the negative

of the log-likelihood rather than maximise the log-likelihood itself. It is simple enough to append

a minus sign in front of the log-likelihood expression to give us the negative log-likelihood

(NLL):

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!