13.08.2022 Views

advanced-algorithmic-trading

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

226

to machine learning, which is the probabilistic formulation.

This approach reframes the problem into one of estimating the form of a probability distribution,

known as conditional density estimation[51, 71]. In the supervised learning approach

this probability distribution is given as p(y | x; θ). This is a conditional probability distribution,

which represents the probability of y taking on any value (or category) given the feature values

x, with a model parametrised by θ. It is implicitly assumed that a model form exists and that

this model is applied to a set of finite feature data, often denoted by D.

Notice that although the probabilistic formulation is very different from the functional approximation

approach, it is attempting to carry out the same task. That is, if a feature vector x

is provided then the goal is to probabilistically estimate the "best" value of y.

The benefit of utilising this probabilistic approach is that probabilities can be assigned to

different values of y, thus leading to a more general mechanism for choosing between these values.

In practice, the value of y with the highest probability is usually chosen as the best guess.

However in quantitative finance the consequences of an incorrect choice can be severe as large

losses can be generated. Hence threshold values are often used to ensure that the probability

assigned to a particular value is significantly high and much larger than other values, reflecting

a strong confidence in a choice.

16.2 Classification

In the classification setting y is a categorical value and is allowed to take values from a finite set

of possible choices, K. These values need not be numerical, as in the case of object detection in

a photo, where the (binary) class values might be "face detected" or "no face detected".

This problem is formalised as attempting to estimate p(y = k | x), for a particular k ∈ K.

To estimate the best guess of y the mode of this distribution is used:

ŷ = ˆf(x) = argmax k∈K p(y = k | x) (16.2)

That is, the best estimate for y, ŷ, is given by the argument k that maximises the value of

p(y = k | x). In the Bayesian interpretation this value is known as the Maximum A Posteriori

(MAP) estimate.

Common classification mechanisms include Logistic Regression (despite the name!), Naive

Bayes Classifiers, Support Vector Machines and Deep Convolution Neural Networks.

Classification will be utilised in this book primarily to estimate class membership of documents

for natural language processing.

16.3 Regression

In the regression framework the goal is similar to classification except that y ∈ R. That is, y

is real-valued rather than categorical. This does not alter the mathematical goal of attempting

to estimate a conditional probability distribution. The goal is still to estimate ŷ. However, the

argument that maximises the the probability distribution (the mode) is now real-valued:

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!