16.03.2021 Views

Advanced Deep Learning with Keras

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Policy Gradient Methods

while not done:

# [min, max] action = [-1.0, 1.0]

# for baseline, random choice of action will not move

# the car pass the flag pole

if args.random:

action = env.action_space.sample()

else:

action = agent.act(state)

env.render()

# after executing the action, get s', r, done

next_state, reward, done, _ = env.step(action)

next_state = np.reshape(next_state, [1, state_dim])

# save the experience unit in memory for training

# Actor-Critic does not need this but we keep it anyway.

item = [step, state, next_state, reward, done]

agent.remember(item)

if args.actor_critic and train:

# only actor-critic performs online training

# train at every step as it happens

agent.train(item, gamma=0.99)

elif not args.random and done and train:

# for REINFORCE, REINFORCE with baseline, and A2C

# we wait for the completion of the episode before

# training the network(s)

# last value as used by A2C

v = 0 if reward > 0 else agent.value(next_state)[0]

agent.train_by_episode(last_value=v)

# accumulate reward

total_reward += reward

# next state is the new state

state = next_state

step += 1

[ 334 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!