16.03.2021 Views

Advanced Deep Learning with Keras

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 10

Figure 10.6.5: Policy model (actor model)

Given the MountainCarContinuous-v0 environment, the policy (or actor)

model predicts the action that must be applied on the car. As discussed in

the first section of this chapter on policy gradient methods, for continuous

action spaces the policy model samples an action from a Gaussian distribution,

π a s , θ = a ~ N µ s , σ s . In Keras, this is implemented as:

( ) ( ( ) ( ))

t t t t t

an

# given mean and stddev, sample an action, clip and return

# we assume Gaussian distribution of probability of selecting

# action given a state

def action(self, args):

mean, stddev = args

dist = tf.distributions.Normal(loc=mean, scale=stddev)

action = dist.sample(1)

action = K.clip(action,

self.env.action_space.low[0],

self.env.action_space.high[0])

return action

The action is clipped between its minimum and maximum possible values.

The role of the policy network is to predict the mean and standard deviation of the

Gaussian distribution. Figure 10.6.5 shows the policy network to model π( at

st

, θ ) . It's

worth noting that the encoder model has pretrained weights that are frozen. Only

the mean and standard deviation weights receive the performance gradient updates.

[ 323 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!