16.03.2021 Views

Advanced Deep Learning with Keras

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 10

Figure 10.6.1 MountainCarContinuous-v0 OpenAI gym environment

Unlike Q-Learning, policy gradient methods are applicable to both discrete and

continuous action spaces. In our example, we'll demonstrate the four policy gradient

methods on a continuous action space case example, MountainCarContinuous-v0

of OpenAI gym, https://gym.openai.com. In case you are not familiar with

OpenAI gym, please see Chapter 9, Deep Reinforcement Learning.

A snapshot of MountainCarContinuous-v0 2D environment is shown in Figure

10.6.1. In this 2D environment, a car with a not too powerful engine is between two

mountains. In order to reach the yellow flag on top of the mountain on the right, it

must drive back and forth to gain enough momentum. The more energy (that is, the

greater the absolute value of action) that is applied to the car, the smaller (or, the

more negative) is the reward. The reward is always negative, and it is only positive

upon reaching the flag. In that case, the car receives a reward of +100. However,

every action is penalized by the following code:

reward-= math.pow(action[0],2)*0.1

The continuous range of valid action values is [-1.0, 1.0]. Beyond the range,

the action is clipped to its minimum or maximum value. Therefore, it makes

no sense to apply an action value that is greater than 1.0 or less than -1.0.

The MountainCarContinuous-v0 environment state has two elements:

• Car position

• Car velocity

[ 319 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!