09.05.2023 Views

pdfcoffee

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Reinforcement Learning

In the following diagram, you can see the architecture of Dueling DQN:

Figure 5: Visualizing the architecture of a Dueling DQN

You may be wondering, what is the advantage of doing all of this? Why decompose

Q if we will be just putting it back together? Well, decoupling the value and

advantage functions allow us to know which states are valuable, without having

to take into account the effect of each action for each state. There are many states

that, irrespective of the action taken, are good or bad states: for example, having

breakfast with your loved ones in a good resort is always a good state, and being

admitted to a hospital emergency ward is always a bad state. Thus, separating value

and advantage allows one to get a more robust approximation of the value function.

Next, you can see a figure from the paper highlighting how in the Atari game

Enduro, the value network learns to pay attention to the road, and the advantage

network learns to pay attention only when there are cars immediately in front, so as

to avoid collision:

[ 432 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!