09.05.2023 Views

pdfcoffee

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 11

And unlike unsupervised learning, the agent's goal is not to find some inherent

structure in the input (the learning may find some structure, but that isn't the

goal); instead, its only goal is to maximize the rewards (in the long run) and reduce

the punishments.

RL lingo

Before learning various RL algorithms, it is important we understand a few

important terms. We will illustrate the terms with the help of two examples, first a

robot in a maze, and second an agent controlling the wheels of a self-driving car. The

two RL agents are shown as follows:

• State, S: State is the set of tokens (or representations) that can define all of the

possible states the environment can be in. It can be continuous or discrete.

In the case of the robot finding its path through a maze, the state can be

represented by a 4 × 4 array, with elements telling whether that block is empty

or occupied or blocked. A block with a value of 1 means it is occupied by the

robot, 0 means it is empty, and X represents that the block is impassable. Each

element in this array S, can have one of the three discrete values, so the state is

discrete in nature. Next, consider the agent controlling the steering wheel of a

self-driving car. The agent takes as an input the front view image. The image

contains continuous valued pixels, so here the state is continuous.

• Action, A(S): Actions are the set of all possible things that the agent can do in

a particular state. The set of possible actions, A, depends on the present state,

S. Actions may or may not result in a change of state. Like states, they can

be discrete or continuous. The robot finding a path in the maze can perform

five discrete actions [up, down, left, right, no change]. The SDC agent, on the

other hand, can rotate the steering wheel in a continuous range of angles.

[ 409 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!