13.08.2022 Views

advanced-algorithmic-trading

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

223

While seemingly not a traditional area that might be applied to quantitative finance, it can

be used indirectly on alternative data sources such as satellite imagery, to produce information

on supply and demand. For instance, analysing oil storage tank height and maritime oil freight

traffic can lead to a better understanding of current supply and demand for crude oil.

15.4.5 Model Accuracy

One of the trickiest aspects of machine learning is determining which model is "best" for any

particular problem or dataset at hand. This is known as model selection.

For supervised learning models a major issue arises when the model flexibility is adjusted.

While this helps the model adapt to more complex datasets it is also increasing the likelihood of

overfitting.

This situation occurs when the model is more closely aligned to "noise" in the training data

than the underlying "signal". It has the effect of reduced generalisation performance of the

model on unseen data. This particular issue leads to a balancing act between flexibility and

performance known as the bias-variance tradeoff.

A technique to mitigate the effects of the bias-variance tradeoff is known as cross-validation.

It involves partitioning the data into random subsets and fitting the model on each. Each model

fit is then assessed and averaged over the entire set, with the goal of producing a more robust

model that is less prone to overfitting on unseen data.

The bias-variance tradeoff and cross-validation techniques will be discussed at length in subsequent

chapters.

15.4.6 Parametric and Non-Parametric Models

Statistical machine learning models can be categorised into parametric and non-parametric methods.

They each have their advantages and disadvantages when applied to quantitative finance

data.

Parametric Models

A parametric statistical learning model is one that involves a specified model form for f along

with a set of parameters that define its behaviour. The canonical example of a parametric

model is linear regression. It involves estimating a set of p + 1 coefficients, given by the vector

β = (β 0 , β 1 , ..., β p ) whereby the response Y is linearly proportional, via each proportionality

constant β j , to each feature x j , plus an "intercept" term β 0 . The parameters of the model are

the β j coefficients.

Most of the models considered in this book will be parametric. While linear parametric models

may not seemingly be flexible enough to handle the non-linearities of asset price data, they can

often form effective trading algorithms. As always, by increasing the number of parameters in

order to increase flexibility there is always the danger of overfitting as the model is following the

"noise" too closely and not the "signal".

Non-Parametric Models

The alternative is to consider a form for f that does not involve any parameters - it is nonparametric.

The benefit of using a non-parametric model is greater flexibility. Their main

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!