13.08.2022 Views

advanced-algorithmic-trading

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

420

• Short-Term Trend - Predict multiple periods ahead for a set of equity returns, across a

large range of assets

• Up/Down Factor - Predict which equities will rise by a certain factor and not drop by

another factor, over a set of future bars (e.g. 5 mins)

It is not too challenging to come up with many other examples of such predictions. In this

chapter the first and third will be utilised in order to demonstrate intraday functionality.

29.1.1 Class Imbalance

A substantial difficulty with making predictions in a quantitative finance setting is that they

often lead to a situation of class imbalance.

Consider the third example above. There are very few instances of equities that rise at least

by a certain specific factor and do not decrease by another specific factor, compared to more

common patterns. Consider for example a rise of at least 2% with a decrease of no more than

1% over the next five bars, say.

If this "up/down factor" is formulated as a supervised binary classification problem, where a

set of lagged bars are used as features and subsequent bars are designated as "up/down factor"

bars, then the amount of bars which are not "up/down factor" bars will significantly outnumber

the quantity which are.

This often manifests itself via models that simply predict the class label with the largest class

size. The classification accuracy, or Hit Rate, then simply reflects the ratio of the class imbalance

itself rather than correct prediction accuracy on each class.

Thankfully identification of this problem is straightforward. A confusion matrix can be

calculated, which identifies true positives and true negatives along with the false positive (Type

I error) and false negative (Type II error). This will clearly highlight class imbalance due to the

fact that the true positive and false negative values will contain a very high proportion of the

samples.

For instance, in the following binary classification problem output a respectable (for quantitative

finance!) hit-rate of 58% has been achieved. However it is clear this is simply an artifact

of the classifier choosing one value almost exclusively over another. This can be seen from the

fact that the majority of samples are clustered on the top row:

Hit-Rate: 0.5885

[[48067 33618]

[ 203 293]]

Many solutions exist to mitigate against this problem. It is beyond the scope of this chapter

to discuss them all at length here. Common tactics include sampling more data (often tricky in

quant finance settings with only one "history") and reducing the samples in the larger class to

more evenly balance the class sizes.

Further compounding the issue for time series work is the serial correlation of the series. This

means that each bar "sample" is not independent and identically distributed (iid) and hence

cannot be easily removed from the larger class in a random sampling manner.

It is always necessary to use a confusion matrix to check that the class imbalance problem is

not too severe, otherwise any deployed machine learning model attached to a trading engine will

likely generate significant losses.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!