Advanced Deep Learning with Keras

fourpersent2020
from fourpersent2020 More from this publisher
16.03.2021 Views

Chapter 10Figure 10.6.1 MountainCarContinuous-v0 OpenAI gym environmentUnlike Q-Learning, policy gradient methods are applicable to both discrete andcontinuous action spaces. In our example, we'll demonstrate the four policy gradientmethods on a continuous action space case example, MountainCarContinuous-v0of OpenAI gym, https://gym.openai.com. In case you are not familiar withOpenAI gym, please see Chapter 9, Deep Reinforcement Learning.A snapshot of MountainCarContinuous-v0 2D environment is shown in Figure10.6.1. In this 2D environment, a car with a not too powerful engine is between twomountains. In order to reach the yellow flag on top of the mountain on the right, itmust drive back and forth to gain enough momentum. The more energy (that is, thegreater the absolute value of action) that is applied to the car, the smaller (or, themore negative) is the reward. The reward is always negative, and it is only positiveupon reaching the flag. In that case, the car receives a reward of +100. However,every action is penalized by the following code:reward-= math.pow(action[0],2)*0.1The continuous range of valid action values is [-1.0, 1.0]. Beyond the range,the action is clipped to its minimum or maximum value. Therefore, it makesno sense to apply an action value that is greater than 1.0 or less than -1.0.The MountainCarContinuous-v0 environment state has two elements:• Car position• Car velocity[ 319 ]

Policy Gradient MethodsThe state is converted to state features by an encoder. The predicted action is theoutput of the policy model given the state. The output of the value function is thepredicted value of the state:Figure 10.6.2 Autoencoder modelFigure 10.6.3 Encoder model[ 320 ]

Policy Gradient Methods

The state is converted to state features by an encoder. The predicted action is the

output of the policy model given the state. The output of the value function is the

predicted value of the state:

Figure 10.6.2 Autoencoder model

Figure 10.6.3 Encoder model

[ 320 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!