16.03.2021 Views

Advanced Deep Learning with Keras

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Deep Reinforcement Learning

# store every experience unit in replay buffer

agent.remember(state, action, reward, next_state, done)

state = next_state

total_reward += reward

# call experience relay

if len(agent.memory) >= batch_size:

agent.replay(batch_size)

scores.append(total_reward)

mean_score = np.mean(scores)

if mean_score >= win_reward[args.env_id] and episode >= win_

trials:

print("Solved in episode %d: Mean survival = %0.2lf in %d

episodes"

% (episode, mean_score, win_trials))

print("Epsilon: ", agent.epsilon)

agent.save_weights()

break

if episode % win_trials == 0:

print("Episode %d: Mean survival = %0.2lf in %d episodes" %

(episode, mean_score, win_trials))

Double Q-Learning (DDQN)

In DQN, the target Q-Network selects and evaluates every action resulting in an

overestimation of Q value. To resolve this issue, DDQN [3] proposes to use the

Q-Network to choose the action and use the target Q-Network to evaluate the action.

In DQN as summarized by Algorithm 9.6.1, the estimate of the Q value in line 10 is:

Q

max

rj+

1

if episodeterminates at j + 1⎫

= ⎨

rj+ 1

γ max Qtarget ( s

j+ 1, a

j+

1;

θ ) otherwise ⎬

+

a

j+

1

⎪⎭

Q target

chooses and evaluates the action a j+1

.

DDQN proposes to change line 10 to:

[ 302 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!