16.03.2021 Views

Advanced Deep Learning with Keras

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Deep Reinforcement Learning

Conclusion

In this chapter, we've been introduced to DRL. A powerful technique believed

by many researchers as the most promising lead towards artificial intelligence.

Together, we've gone over the principles of RL. RL is able to solve many toy

problems, but the Q-Table is unable to scale to more complex real-world problems.

The solution is to learn the Q-Table using a deep neural network. However, training

deep neural networks on RL is highly unstable due to sample correlation and nonstationarity

of the target Q-Network.

DQN proposed a solution to these problems using experience replay and separating

the target network from the Q-Network under training. DDQN suggested further

improvement of the algorithm by separating the action selection from action

evaluation to minimize the overestimation of Q value. There are other improvements

proposed for the DQN. Prioritized experience replay [6] argues that that experience

buffer should not be sampled uniformly. Instead, experiences that are more

important based on TD errors should be sampled more frequently to accomplish

more efficient training. [7] proposes a dueling network architecture to estimate the

state value function and the advantage function. Both functions are used to estimate

the Q value for faster learning.

The approach presented in this chapter is value iteration/fitting. The policy is

learned indirectly by finding an optimal value function. In the next chapter, the

approach will be to learn the optimal policy directly by using a family of algorithms

called policy gradient methods. Learning the policy has many advantages. In

particular, policy gradient methods can deal with both discrete and continuous

action spaces.

[ 304 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!