13.08.2022 Views

advanced-algorithmic-trading

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

253

tslag["Lag%s" % str(i+1)] = ts["Adj Close"].shift(i+1)

# Create the returns DataFrame

tsret = pd.DataFrame(index=tslag.index)

tsret["Volume"] = tslag["Volume"]

tsret["Today"] = tslag["Today"].pct_change()*100.0

# Create the lagged percentage returns columns

for i in range(0,lags):

tsret["Lag%s" % str(i+1)] = tslag[

"Lag%s" % str(i+1)

].pct_change()*100.0

tsret = tsret[tsret.index >= start_date]

return tsret

In the __main__ function the parameters are set. Firstly a random seed is defined to make

the code replicable on other work environments. n_jobs controls the number of processor cores

to use in bagging and Random Forests. Boosting is not parallelisable so does not make use of

this parameter.

n_estimators defines the total number of estimators to use in the graph of the MSE, while

the step_factor controls how granular the calculation is by stepping through the number of

estimators. In this instance axis_step is equal to 1000/10 = 100. That is, 100 separate

calculations will be performed for each of the three ensemble methods:

# Set the random seed, number of estimators

# and the "step factor" used to plot the graph of MSE

# for each method

random_state = 42

n_jobs = 1 # Parallelisation factor for bagging, random forests

n_estimators = 1000

step_factor = 10

axis_step = int(n_estimators/step_factor)

The following code downloads ten years worth of AMZN prices and converts them into a

return series with lags using the above mentioned function create_lagged_series. Missing

values are dropped (a consequence of the lag procedure) and the data is scaled to exist between

-1 and +1 for ease of comparison. This latter procedure is common in machine learning and

helps features with large differences in absolute sizes be comparable to the models:

# Download ten years worth of Amazon

# adjusted closing prices

start = datetime.datetime(2006, 1, 1)

end = datetime.datetime(2015, 12, 31)

amzn = create_lagged_series("AMZN", start, end, lags=3)

amzn.dropna(inplace=True)

# Use the first three daily lags of AMZN closing prices

# and scale the data to lie within -1 and +1 for comparison

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!