01.11.2014 Views

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

MACHINE LEARNING TECHNIQUES - LASA

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

171<br />

7.3.1 Principle<br />

In reinforcement learning, the purpose or goal of the agent is formalized in terms of a special<br />

reward signal passing from the environment to the agent. At each time step, the reward is a<br />

r ∈R. Informally, the agent's goal is to maximize the total amount of reward it<br />

simple number,<br />

t<br />

receives. This means maximizing not immediate reward, but cumulative reward in the long run.<br />

The reinforcement-learning problem is meant to be a straightforward framing of the problem of<br />

learning from interaction to achieve a goal. The learner and decision-maker is called the agent.<br />

The thing it interacts with, comprising everything outside the agent, is called the environment.<br />

These interact continually, the agent selecting actions and the environment responding to those<br />

actions and presenting new situations to the agent. The environment also gives rise to rewards,<br />

special numerical values that the agent tries to maximize over time. A complete specification of<br />

an environment defines a task, one instance of the reinforcement-learning problem.<br />

More specifically, the agent and environment interact at each of a sequence of discrete time<br />

. At each time step t , the agent receives some representation of the<br />

steps, t = 0,1, 2,3,...<br />

environment's state,<br />

action,<br />

at<br />

st<br />

∈ S<br />

∈ A, where ( )<br />

, where S is the set of possible states, and on that basis selects an<br />

As is the set of actions available in state s<br />

t<br />

. One time step later, in<br />

t<br />

part as a consequence of its action, the agent receives a numerical reward,<br />

t<br />

itself in a new state s<br />

t +<br />

. Figure 7-6 diagrams the agent-environment interaction.<br />

1<br />

r ∈R, and finds<br />

Figure 7-6: The agent-environment interaction in reinforcement learning.<br />

An intuitive way to understand the relation between the agent and its environment is with the<br />

following example dialogue.<br />

Environment: You are in state 65. You have 4 possible<br />

actions.<br />

Agent: I'll take action 2.<br />

Environment: You received a reinforcement of 7 units.<br />

You are now in state 15. You have 2<br />

possible actions.<br />

Agent: I'll take action 1.<br />

Environment: You received a reinforcement of -4 units.<br />

You are now in state 65. You have 4<br />

© A.G.Billard 2004 – Last Update March 2011

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!