16.03.2021 Views

Advanced Deep Learning with Keras

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 9

updates

# correction on the Q value for the action used

q_values[0][action] = reward if done else q_value

# collect batch state-q_value mapping

state_batch.append(state[0])

q_values_batch.append(q_values[0])

# train the Q-network

self.q_model.fit(np.array(state_batch),

np.array(q_values_batch),

batch_size=batch_size,

epochs=1,

verbose=0)

# update exploration-exploitation probability

self.update_epsilon()

# copy new params on old target after every 10 training

if self.replay_counter % 10 == 0:

self.update_weights()

self.replay_counter += 1

# decrease the exploration, increase exploitation

def update_epsilon(self):

if self.epsilon > self.epsilon_min:

self.epsilon *= self.epsilon_decay

Listing 9.6.2, dqn-cartpole-9.6.1.py. Training loop of DQN implementation

in Keras:

# Q-Learning sampling and fitting

for episode in range(episode_count):

state = env.reset()

state = np.reshape(state, [1, state_size])

done = False

total_reward = 0

while not done:

# in CartPole-v0, action=0 is left and action=1 is right

action = agent.act(state)

next_state, reward, done, _ = env.step(action)

# in CartPole-v0:

# state = [pos, vel, theta, angular speed]

next_state = np.reshape(next_state, [1, state_size])

[ 301 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!