09.05.2023 Views

pdfcoffee

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Reinforcement Learning

• Consider, for example, our robot in a maze: assuming that the value of each

state is the negative of the number of steps needed to reach from that box to

goal, then, at each time step, the agent will choose the action that takes it to a

state with optimal value, as in the following diagram. So, starting from a value

of -6, it'll move to -5, -4, -3, -2, -1, and eventually reach the goal with the value 0:

• Policy-based methods: In these methods, the algorithms predict the optimal

policy (the one that maximizes the expected return), without maintaining

the value function estimates. The aim is to find the optimal policy, instead of

optimal action. An example of the policy-based method is policy-gradients.

Here, we approximate the policy function, which allows us to map each state

to the best corresponding action. One advantage of policy-based methods

over value-based is that we can use them even for continuous action spaces.

Besides the algorithms approximating either policy or value, there are a few

questions we need to answer to make reinforcement learning work:

• How does the agent choose its actions, especially when untrained?

When the agent starts learning, it has no idea what is the best way in which

to determine an action, or which action will provide the best Q-value. So how

do we go about it? We take a leaf out of nature's book. Like bees and ants,

the agent makes a balance between exploring the new actions and exploiting

the learned ones. Initially when the agent starts it has no idea which

action among the possible actions is better, so it makes random choices,

but as it learns it starts making use of the learned policy. This is called the

Exploration vs Exploitation [2] tradeoff. Using exploration, the agent gathers

more information, and later exploits the gathered information to make the

best decision.

[ 412 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!