13.08.2022 Views

advanced-algorithmic-trading

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

83

As with the above definitions of covariance and correlation, we can define the sample autocovariance

and sample autocorrelation. In particular, we denote the sample autocovariance with

a lower-case c to differentiate between the population value given by an upper-case C.

The sample autocovariance function c k is given by:

c k = 1 n−k

(x t − ¯x)(x t+k − ¯x) (8.5)

n

t=1

The sample autocorrelation function r k is given by:

r k = c k

c 0

(8.6)

Now that we have defined the sample autocorrelation function we are in a position to define

and plot the correlogram, an essential tool in time series analysis.

8.5 The Correlogram

A correlogram is simply a plot of the autocorrelation function for sequential values of lag

k = 0, 1, ..., n. It allows us to see the correlation structure in each lag.

The main usage of correlograms is to detect any autocorrelation subsequent to the removal

of any deterministic trends or seasonality effects.

If we have fitted a time series model then the correlogram helps us justify that this model is

well fitted or whether we need to further refine it to remove any additional autocorrelation.

Here is an example correlogram, plotted in R using the acf function, for a sequence of

normally distributed random variables. The full R code is as follows and is plotted in Figure 8.2.

> set.seed(1)

> w <- rnorm(100)

> acf(w)

There are a few notable features of the correlogram plot in R:

• Firstly, since the sample correlation of lag k = 0 is given by r 0 = c0

c 0

= 1 we will always

have a line of height equal to unity at lag k = 0 on the plot. In fact, this provides us with

a reference point upon which to judge the remaining autocorrelations at subsequent lags.

Note also that the y-axis ACF is dimensionless, since correlation is itself dimensionless.

• The dotted blue lines represent boundaries upon which if values fall outside of these, we

have evidence against the null hypothesis that our correlation at lag k, r k , is equal to zero

at the 5% level. However we must take care because we should expect 5% of these lags to

exceed these values anyway! Further we are displaying correlated values and hence if one lag

falls outside of these boundaries then proximate sequential values are more likely to do so as

well. In practice we are looking for lags that may have some underlying reason for exceeding

the 5% level. For instance, in a commodity time series we may be seeing unanticipated

seasonality effects at certain lags (possibly monthly, quarterly or yearly intervals).

Here are a couple of examples of correlograms for sequences of data.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!