16.03.2021 Views

Advanced Deep Learning with Keras

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 10

Conclusion

In this chapter, we've covered the policy gradient methods. Starting with the policy

gradient theorem, we formulated four methods to train the policy network. The

four methods, REINFORCE, REINFORCE with baseline, Actor-Critic, and A2C

algorithms were discussed in detail. We explored how the four methods could be

implemented in Keras. We then validated the algorithms by examining the number

of times the agent successfully reached its goal and in terms of the total rewards

received per episode.

Similar to Deep Q-Network [2] that we discussed in the previous chapter, there

are several improvements that can be done on the fundamental policy gradient

algorithms. For example, the most prominent one is the A3C [3] which is a multithreaded

version of A2C. This enables the agent to get exposed to different

experiences simultaneously and to optimize the policy and value networks

asynchronously. However, in the experiments conducted by OpenAI, https://

blog.openai.com/baselines-acktr-a2c/, there is no strong advantage of A3C

over A2C since the former could not take advantage of the strong GPUs available

nowadays.

Given that this is the end of the book, it's worth noting that the area of deep learning

is huge, and to cover all the advances in one book like this is impossible. What we've

done is carefully selected the advanced topics that I believe will be useful in a wide

range of applications and that you, the reader will be able to easily build on. The

implementations in Keras that have been illustrated throughout this book will allow

you to carry on and apply the techniques in your own work and research.

References

1. Sutton and Barto. Reinforcement Learning: An Introduction, http://

incompleteideas.net/book/bookdraft2017nov5.pdf, (2017).

2. Mnih, Volodymyr, and others. Human-level control through deep reinforcement

learning, Nature 518.7540 (2015): 529.

3. Mnih, Volodymyr, and others. Asynchronous methods for deep reinforcement

learning, International conference on machine learning, 2016.

4. Williams and Ronald J. Simple statistical gradient-following algorithms for

connectionist reinforcement learning, Machine learning 8.3-4 (1992): 229-256.

[ 341 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!