13.08.2022 Views

advanced-algorithmic-trading

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

234

N∑

NLL(θ) = − log p(y i | x i , θ) (17.9)

i=1

This is the function that will be minimised. By doing so the optimal maximum likelihood

estimate for the β coefficients will be derived. It is carried out by an algorithm known as

ordinary least squares (OLS).

17.3.2 Ordinary Least Squares

Restated once again, the current goal is to derive the optimal set of β coefficients that are "most

likely" to have generated the data for a specific set of training data. These coefficients will form

a hyperplane of "best fit" through this data set. The process is carried out as follows:

1. Use the definition of the normal distribution to expand the negative log likelihood function

2. Utilise the properties of logarithms to reformulate this in terms of the Residual Sum of

Squares (RSS), which is equivalent to the sum of each residual across all observations

3. Rewrite the residuals in matrix form, creating the data matrix X, which is N × (p + 1)

dimensional, and formulate the RSS as a matrix equation

4. Differentiate this matrix equation with respect to (w.r.t) the parameter vector β and set

the equation to zero (with some assumptions on X)

5. Solve the subsequent equation for β to receive ˆβ OLS , the ordinary least squares (OLS)

estimate.

The next section will closely follow the treatments of Hastie et al (2009)[51] and Murphy

(2012)[71]. The first step is to expand the NLL using the formula for a normal distribution:

NLL(θ) = −

= −

= −

N∑

log p(y i | x i , θ) (17.10)

i=1

[

N∑ ( ) 1

1

2

log

exp

(− 1

) ]

2πσ 2 2σ 2 (y i − β T x i ) 2 (17.11)

i=1

N∑

i=1

( )

1 1

2 log 2πσ 2 − 1

2σ 2 (y i − β T x i ) 2 (17.12)

= − N 2 log ( 1

2πσ 2 )

− 1

2σ 2

= − N 2 log ( 1

2πσ 2 )

N ∑

i=1

(y i − β T x i ) 2 (17.13)

− 1 RSS(β) (17.14)

2σ2 Where RSS(β) := ∑ N

i=1 (y i − β T x i ) 2 is the Residual Sum of Squares, also known as the

Sum of Squared Errors (SSE).

Since the first term in the equation is a constant it is only necessary to minimise the RSS,

which is sufficient for producing the optimal parameter estimate.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!