16.03.2021 Views

Advanced Deep Learning with Keras

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 9

Figure 9.3.6: Assuming the actions chosen by the agent are two successive moves to the right,

the update on Q value of state (0, 1) is shown

Let's suppose the agent is still in the exploration mode as shown in Figure 9.3.6.

The first step it took for the second episode is a move to the right. As expected, the

update is 0. However, the second random action it chose is also move to the right.

The agent reached the G state and received a big +100 reward. The Q value for the

state (0, 1) move to the right becomes 100. The second episode is done, and the agent

goes back to the Start state.

Figure 9.3.7: Assuming the action chosen by the agent is a move to the right,

the update on Q value of state (0, 0) is shown

[ 279 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!