13.08.2022 Views

advanced-algorithmic-trading

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

51

# Use a linear model (y ~ beta_0 + beta_1*x + epsilon) to

# generate a column ’y’ of responses based on ’x’

eps_mean = 0.0

df["y"] = beta_0 + beta_1*df["x"] + np.random.RandomState(42).normal(

eps_mean, eps_sigma_sq, N

)

return df

if __name__ == "__main__":

# These are our "true" parameters

beta_0 = 1.0 # Intercept

beta_1 = 2.0 # Slope

# Simulate 100 data points, with a variance of 0.5

N = 100

eps_sigma_sq = 0.5

# Simulate the "linear" data using the above parameters

df = simulate_linear_data(N, beta_0, beta_1, eps_sigma_sq)

# Plot the data, and a frequentist linear regression fit

# using the seaborn package

sns.lmplot(x="x", y="y", data=df, size=10)

plt.xlim(0.0, 1.0)

The output is given in Figure 5.1:

We’ve simulated 100 datapoints, with an intercept β 0 = 1 and a slope of β 1 = 2. The epsilon

values are normally distributed with a mean of zero and variance σ 2 = 1 2

. The data has been

plotted using the sns.lmplot method. In addition, the method uses a frequentist MLE approach

to fit a linear regression line to the data.

Now that we have carried out the simulation we want to fit a Bayesian linear regression to

the data. This is where the glm module comes in. It uses a model specification syntax that is

similar to how the R statistical language specifies models. To achieve this we make implicit use

of the Patsy library.

In the following snippet we are going to import PyMC3, utilise the with context manager, as

described in the previous chapter on MCMC and then specify the model using the glm module.

We are then going to find the maximum a posteriori (MAP) estimate for the MCMC sampler

to begin sampling from. Finally, we are going to use the No-U-Turn Sampler[53] to carry out the

actual inference and then plot the trace of the model, discarding the first 500 samples as "burn

in":

def glm_mcmc_inference(df, iterations=5000):

"""

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!