16.03.2021 Views

Advanced Deep Learning with Keras

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 10

Similarly, the value loss functions of Algorithms 10.3.1 and 10.4.1 have the same

structure. The value loss functions are implemented in Keras as value_loss() as

shown in Listing 10.6.3. The common gradient factor ∇

θ

V ( s , )

v t

θ is represented

v

by the tensor y_pred. The remaining factor is represented by y_true. The y_true

values are also shown in Table 10.6.1. REINFORCE does not use a value function.

A2C uses the MSE loss function to learn the value function. In A2C, y_true

represents the target value or ground truth.

Listing 10.6.4, policygradient-car-10.1.1.py shows us, REINFORCE,

REINFORCE with baseline, and A2C are trained by episode. The appropriate

return is computed first before calling the main train routine in Listing 10.6.5:

# train by episode (REINFORCE, REINFORCE with baseline

# and A2C use this routine to prepare the dataset before

# the step by step training)

def train_by_episode(self, last_value=0):

if self.args.actor_critic:

print("Actor-Critic must be trained per step")

return

elif self.args.a2c:

# implements A2C training from the last state

# to the first state

# discount factor

gamma = 0.95

r = last_value

# the memory is visited in reverse as shown

# in Algorithm 10.5.1

for item in self.memory[::-1]:

[step, state, next_state, reward, done] = item

# compute the return

r = reward + gamma*r

item = [step, state, next_state, r, done]

# train per step

# a2c reward has been discounted

self.train(item)

return

# only REINFORCE and REINFORCE with baseline

# use the ff codes

# convert the rewards to returns

rewards = []

gamma = 0.99

for item in self.memory:

[ 329 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!