16.03.2021 Views

Advanced Deep Learning with Keras

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Q

max

Chapter 9

rj+

1

if episodeterminates at j + 1⎫

= ⎨

rj+ 1

+ γQ target s

j+ 1,argmax Q( s

j+ 1, a

j+

1; θ)

; θ

otherwise

a

j+

1

⎪⎩

⎪⎭

The term argmax Q( s

j 1, a

j 1;

θ)

by Q target

.

a j+

1

+ +

lets Q to choose the action. Then this action is evaluated

In Listing 9.6.1, both DQN and DDQN are implemented. Specifically, for DDQN,

the modification on the Q value computation performed by get_target_q_value()

function is highlighted:

# compute Q_max

# use of target Q Network solves the non-stationarity problem

def get_target_q_value(self, next_state):

# max Q value among next state's actions

if self.ddqn:

# DDQN

# current Q Network selects the action

# a'_max = argmax_a' Q(s', a')

action = np.argmax(self.q_model.predict(next_state)[0])

# target Q Network evaluates the action

# Q_max = Q_target(s', a'_max)

q_value = self.target_q_model.predict(next_state)[0][action]

else:

# DQN chooses the max Q value among next actions

# selection and evaluation of action is on the target Q

Network

# Q_max = max_a' Q_target(s', a')

q_value = np.amax(self.target_q_model.predict(next_state)[0])

# Q_max = reward + gamma * Q_max

q_value *= self.gamma

q_value += reward

return q_value

For comparison, on the average of 10 runs, the CartPole-v0 is solved by DDQN

within 971 episodes. To use DDQN, run:

$ python3 dqn-cartpole-9.6.1.py -d

[ 303 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!