16.03.2021 Views

Advanced Deep Learning with Keras

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Policy Gradient Methods

The policy network is basically the implementation of Equations 10.1.4 and 10.1.5 that

are repeated here for convenience:

( ) ( ) T

µ φ θ

st

= s (Equation 10.1.4)

t µ

T

( t )

( ) ( )

σ ς φ θ

st

= s (Equation 10.1.5)

σ

where φ ( s t ) is the encoder, θ µ are the weights of the mean's Dense(1) layer, and

θ

σ are the weights of the standard deviation's Dense(1) layer. We used a modified

softplus function, ς ( ⋅ ), to avoid zero standard deviation:

# some implementations use a modified softplus to ensure that

# the stddev is never zero

def softplusk(x):

return K.softplus(x) + 1e-10

The policy model builder is shown in the following listing. Also included in this

listing are the log probability, entropy, and value models which we will discuss next.

Listing 10.6.2, policygradient-car-10.1.1.py shows us the method for building

the policy (actor), logp, entropy, and value models from the encoded state features:

def build_actor_critic(self):

inputs = Input(shape=(self.state_dim, ), name='state')

self.encoder.trainable = False

x = self.encoder(inputs)

mean = Dense(1,

activation='linear',

kernel_initializer='zero',

name='mean')(x)

stddev = Dense(1,

kernel_initializer='zero',

name='stddev')(x)

# use of softplusk avoids stddev = 0

stddev = Activation('softplusk', name='softplus')(stddev)

action = Lambda(self.action,

output_shape=(1,),

name='action')([mean, stddev])

self.actor_model = Model(inputs, action, name='action')

self.actor_model.summary()

plot_model(self.actor_model, to_file='actor_model.png',

show_shapes=True)

[ 324 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!