13.08.2022 Views

advanced-algorithmic-trading

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

276

y = f(x) + ɛ (20.1)

This states that the response vector y is given as a function f of the predictor vector x with

a set of normally distributed error terms, which are often assumed to have mean zero and a

standard deviation of one.

What does this mean in practice?

As an example the vector x could represent a set of lagged financial prices. This is similar to

the time series autoregressive models we considered earlier in the book. It could also represent

interest rates, derivatives prices, real-estate prices, word-frequencies in a document or any other

factor that we consider useful in making a prediction.

The vector y could be single or multi-valued. In the former case it might simply represent

tomorrow’s stock price, in the latter case it might represent the next week’s daily predicted

prices.

f represents our view on the underlying relationship between y and x. This could be linear,

in which case we may estimate f via a linear regression model. It may be non-linear, in which

case we may estimate f with a SVM or a spline-based method, for instance.

The error terms ɛ represent all of the factors affecting y that we have not taken into account

with our function f. They are essentially the "unknown" components of our prediction model.

It is commmon to assume that these are normally distributed with mean zero and a standard

deviation of one, although other distributions can be used.

In this section we are going to describe how to measure the performance of an estimate for

the unknown function f. Such an estimate uses "hat" notation. Hence, ˆf can be read as "the

estimate of f".

In addition we will describe the effect on the performance of the model as we increase its

flexibility. Flexibility describes the degrees of freedom available to the model to "fit" to the

training data. We will see that the relationship between flexibility and performance error is

non-linear. Thus we need to be extremely careful when choosing the "best" model.

Note that there is never a "best" model across the entirety of statistics, time series or machine

learning. Different models have varying strengths and weaknesses. One model may work very

well on one dataset, but may perform badly on another. The challenge in statistical machine

learning is to pick the "best" model for the problem at hand with the data available.

In fact this notion of there being no "best" model in supervised learning situations is formally

encapsulated in what is known as the "No Free Lunch" Theorem.

20.1.2 Model Selection

When trying to ascertain which statistical machine learning method is best for a particular

situation we need a means of characterising the relative performance between models. In the

time series section we considered the Akaike Information Criterion and the Bayesian Information

Criterion. In this section we will consider other methods.

To determine model suitability we need to compare the known values of the underlying

relationship with those that are predicted by an estimated model.

For instance, if we are attempting to predict tomorrow’s stock prices, then we wish to evaluate

how close our models predictions are to the true value on that particular day.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!