13.08.2022 Views

advanced-algorithmic-trading

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 23

Natural Language Processing

In this chapter we will apply support vector machines to the domain of natural language processing

(NLP) for the purposes of sentiment analysis. Our approach will be to use support vector

machines to automatically classify many text documents into mutually exclusive groups. Note

that this is a supervised learning technique.

23.1 Overview

There are a significant number of steps to carry out between viewing a text document on a web

site and using its content as input to an automated trading strategy. In particular the following

steps must be carried out:

• Automate the download of multiple, continually generated articles from external sources

at a potentially high throughput

• Parse these documents for the relevant sections of text/information that require analysis,

even if the format differs between documents

• Convert arbitrarily long passages of text (over many possible languages) into a consistent

data structure that can be understood by a classification system

• Determine a set of groups (or labels) that each document will be a member of. Examples

include "positive" and "negative" or "bullish" and "bearish"

• Create a "training corpus" of documents that have known labels associated with them. For

instance, a thousand financial articles may need tagging with the "bullish" or "bearish"

labels

• Train the classifier(s) on this corpus by means of a software library such as Scikit-Learn

(which we will be using below)

• Use the classifier to label new documents, in an automated, ongoing manner.

• Assess the "classification rate" and other associated performance metrics of the classifier

• Integrate the classifier into an automated trading system, either by means of filtering other

trade signals or generating new ones.

325

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!