13.08.2022 Views

advanced-algorithmic-trading

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 17

Linear Regression

In this chapter a familiar statistical technique, linear regression, will be introduced in a more

rigourous mathematical setting under a probabilistic, supervised learning interpretation. By

studying a well-known technique in slightly more mathematical rigour than is often utilised

it simplifies the extensions to more complex machine learning models discussed in subsequent

chapters.

The chapter will begin by defining multiple linear regression and placing it in a probabilistic

supervised learning framework. From there the optimal estimate for its parameters will

be derived via a statistical technique known as maximum likelihood estimation (MLE).

To demonstrate how to fit linear regression on simulated data the Python Scikit-Learn library

will be used. This will have the benefit of introducing the Scikit-Learn machine learning API,

which remains similar across many differing machine learning models.

This chapter and a selection of those that follow are more mathematically rigourous than other

chapters have been so far. The rationale for this is to introduce the more advanced probabilistic

interpretation which pervades machine learning research. Once a few examples of simpler models

in such a framework have been demonstrated it simplifies the task of studying the more advanced

machine learning research papers for useful trading ideas.

17.1 Linear Regression

Linear regression is a familiar statistical technique. It is often taught at highschool, albeit in

a simplified manner. It is generally the first technique considered when studying supervised

learning since it allows many of the more advanced machine learning concepts to be discussed in

a vastly simplified setting.

Formally, multiple linear regression states that a scalar response value y is a linear function

of its feature inputs x. That is:

p∑

y(x) = β T x + ɛ = β j x j + ɛ (17.1)

Where β T , x ∈ R p+1 and ɛ ∼ N (µ, σ 2 ). That is, β T and x are both real-valued vectors of

dimension p + 1 and ɛ, the error or residual term, is normally distributed with mean µ and

229

j=0

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!