09.05.2023 Views

pdfcoffee

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Reinforcement Learning

Rainbow

Rainbow is the current state of the art DQN variant. Technically, to call it a DQN

variant would be wrong. In essence it is an ensemble of many DQN variants

combined together into a single algorithm. It modifies the distributional RL [6] loss

to multi-step loss and combines it with Double DQN using greedy action. Quoting

from the paper:

"The network architecture is a dueling network architecture adapted for use with

return distributions. The network has a shared representation fξ(s), which is then

fed into a value stream v η

with N atoms

outputs, and into an advantage stream aξ

with N atoms

×N actions

outputs, where aa ξξ ii (ff ξξ (ss), aa) will denote the output corresponding

to atom i and action a. For each atom z i

, the value and advantage streams are

aggregated, as in Dueling DQN, and then passed through a softmax layer to

obtain the normalised parametric distributions used to estimate the returns'

distributions."

Rainbow combines six different RL algorithms:

• N-step returns

• Distributional state-action value learning

• Dueling networks

• Noisy networks

• Double DQN

• Prioritized Experience Replay

Deep deterministic policy gradient

DQN and its variants have been very successful in solving problems where the state

space is continuous and action space is discrete. For example, in Atari games, the

input space consists of raw pixels, but actions are discrete - [up, down, left, right,

no-op]. How do we solve a problem with continuous action space? For instance,

say an RL agent driving a car needs to turn its wheels: this action has a continuous

action space One way to handle this situation is by discretizing the action space

and continuing with DQN or its variants. However, a better solution would be to

use a policy gradient algorithm. In policy gradient methods the policy ππ(AA|SS) is

approximated directly.

[ 434 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!