16.03.2021 Views

Advanced Deep Learning with Keras

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 9

# training of Q Table

if done:

# update exploration-exploitation ratio

# reward > 0 only when Goal is reached

# otherwise, it is a Hole

if reward > 0:

wins += 1

if not args.demo:

agent.update_q_table(state, action, reward, next_state)

agent.update_epsilon()

state = next_state

percent_wins = 100.0 * wins / (episode + 1)

print("-------%0.2f%% Goals in %d Episodes---------"

% (percent_wins, episode))

if done:

time.sleep(5 * delay)

else:

time.sleep(delay)

Deep Q-Network (DQN)

Using the Q-Table to implement Q-Learning is fine in small discrete environments.

However, when the environment has numerous states or continuous as in most

cases, a Q-Table is not feasible or practical. For example, if we are observing a state

made of four continuous variables, the size of the table is infinite. Even if we attempt

to discretize the four variables into 1000 values each, the total number of rows in the

table is a staggering 1000 4 = 1e 12 . Even after training, the table is sparse - most of the

cells in this table are zero.

A solution to this problem is called DQN [2] which uses a deep neural network

to approximate the Q-Table. As shown in Figure 9.6.1. There are two approaches

to build the Q-network:

1. The input is the state-action pair, and the prediction is the Q value

2. The input is the state, and the prediction is the Q value for each action

The first option is not optimal since the network will be called a number of times

equal to the number of actions. The second is the preferred method. The Q-Network

is called only once.

[ 293 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!