09.05.2023 Views

pdfcoffee

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Reinforcement Learning

By default, it will store the video of 1, 8, 27, 64, (episode numbers with perfect cubes),

and so on and then every 1,000th episode; each training, by default, is saved in one

folder. The code to do this is:

import gym

env = gym.make("Breakout-v0")

env = gym.wrappers.Monitor(env, 'recording', force=True)

observation = env.reset()

for _ in range(1000):

#env.render()

action = env.action_space.sample()

# your agent here (this takes random actions)

observation, reward, done, info = env.step(action)

if done:

observation = env.reset()

env.close()

For Monitor to work one requires FFmpeg support, you may need

to install it depending upon your OS, in case it is missing.

This will save the videos in mp4 format in the folder recording. An important thing

to note here is that you have to set force=True option if you want to use the same

folder for the next training session.

Deep Q-Networks

Deep Q-networks, DQNs for short, are deep learning neural networks designed

to approximate the Q-function (value-state function), it is one of the most popular

value-based reinforcement learning algorithms. The model was proposed by

Google's DeepMind in NIPS 2013, in the paper entitled Playing Atari with Deep

Reinforcement Learning. The most important contribution of this paper was that they

used the raw state space directly as input to the network; the input features were not

hand-crafted as done in earlier RL implementations. Also, they could train the agent

with exactly the same architecture to play different Atari games and obtain state of

the art results.

This model is an extension of the simple Q-learning algorithm. In Q-learning

algorithms a Q-table is maintained as a cheat sheet. After each action the Q-table is

updated using the Bellman equation [5]:

QQ(SS tt , AA tt ) = (1 − αα)QQ(SS tt , AA tt ) + αα(RR tt+1 + γγ mmmmmm AA QQ(SS tt+1 , AA tt ))

[ 420 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!