13.08.2022 Views

advanced-algorithmic-trading

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

235

To simplify the notation the latter term can be written in matrix form.

N × (p + 1) matrix X it is possible to write the RSS term as:

By defining the

RSS(β) = (y − Xβ) T (y − Xβ) (17.15)

This term is now differentiated w.r.t. the parameter variable β:

∂RSS

∂β

= −2X T (y − Xβ) (17.16)

A key assumption about the data is made here. It is necessary for the matrix X T X to be

positive-definite, which is only the case if there are more observations than there are dimensions.

If this is not the case (which is extremely common in high-dimensional data settings) then it is

not possible to find a unique set of β coefficients and thus the following matrix equation will not

hold.

Under the assumption of a positive-definite X T X the differentiated equation is set to zero

and solved for β:

X T (y − Xβ) = 0 (17.17)

The solution to this matrix equation provides ˆβ OLS :

ˆβ OLS = (X T X) −1 X T y (17.18)

17.4 Simulated Data Example with Scikit-Learn

Having outlined the theoretical OLS procedure for MLE the focus will now turn to implementation

of linear regression as a machine learning technique within Python. The regression problem

will make use of Scikit-Learn, which is a mature machine learning library for Python.

The goal of this simple example is primarily to introduce the API used by Scikit-Learn in a

simpler setting since it will be utilised heavily in the remaining chapters.

In this example a set of "feature" values x i will be randomly generated from a normal distribution

with mean µ x = 0 and variance σ x = 10. These values will be used to create responses y i

of the form:

y i = α + βx i + ɛ (17.19)

Where α = 2 is the intercept, β = 3 is the slope and ɛ is a normally-distributed error with

mean µ ɛ = 0 and variance σ ɛ = 30.

500 such (X i , y i ) pairs will be generated. 400 of these will be used to form a "training" set,

while the remaining 100 will be held out to form a "testing" or evaluation set.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!