pdfcoffee

soumyasankar99
from soumyasankar99 More from this publisher
09.05.2023 Views

Chapter 1In Figure 4 each node in the first hidden layer receives an input and "fires" (0,1)according to the values of the associated linear function. Then, the output of the firsthidden layer is passed to the second layer where another linear function is applied,the results of which are passed to the final output layer consisting of one singleneuron. It is interesting to note that this layered organization vaguely resemblesthe organization of the human vision system, as we discussed earlier.Problems in training the perceptron and theirsolutionsLet's consider a single neuron; what are the best choices for the weight w and the biasb? Ideally, we would like to provide a set of training examples and let the computeradjust the weight and the bias in such a way that the errors produced in the outputare minimized.In order to make this a bit more concrete, let's suppose that we have a set of imagesof cats and another separate set of images not containing cats. Suppose that eachneuron receives input from the value of a single pixel in the images. While thecomputer processes those images, we would like our neuron to adjust its weightsand its bias so that we have fewer and fewer images wrongly recognized.This approach seems very intuitive, but it requires a small change in the weights (orthe bias) to cause only a small change in the outputs. Think about it: if we have abig output jump, we cannot learn progressively. After all, kids learn little by little.Unfortunately, the perceptron does not show this "little-by-little" behavior. Aperceptron is either a 0 or 1, and that's a big jump that will not help in learning (seeFigure 5):Figure 5: Example of perceptron - either a 0 or 1[ 9 ]

Neural Network Foundations with TensorFlow 2.0We need something different; something smoother. We need a function thatprogressively changes from 0 to 1 with no discontinuity. Mathematically, thismeans that we need a continuous function that allows us to compute the derivative.You might remember that in mathematics the derivative is the amount by which afunction changes at a given point. For functions with input given by real numbers,the derivative is the slope of the tangent line at a point on a graph. Later in thischapter, we will see why derivatives are important for learning, when we talk aboutgradient descent.Activation function – sigmoid1The sigmoid function defined as σσ(xx) =1 + ee −xx and represented in the followingfigure has small output changes in the range (0, 1) when the input varies in the range(−∞, ∞) . Mathematically the function is continuous. A typical sigmoid function isrepresented in Figure 6:Figure 6: A sigmoid function with output in the range (0,1)A neuron can use the sigmoid for computing the nonlinear function σσ(zz = wwww + bb) .Note that if z = wx + b is very large and positive, then ee −zz → 0 so σσ(zz) → 1 , while ifz = wx + b is very large and negative ee −zz → ∞ so σσ(zz) → 0 . In other words, a neuronwith sigmoid activation has a behavior similar to the perceptron, but the changes aregradual and output values such as 0.5539 or 0.123191 are perfectly legitimate. In thissense, a sigmoid neuron can answer "maybe."Activation function – tanhAnother useful activation function is tanh. Defined as tanh(zz) = ee zz − ee −zzee zz whose− ee−zz shape is shown in Figure 7, its outputs range from -1 to 1:[ 10 ]

Neural Network Foundations with TensorFlow 2.0

We need something different; something smoother. We need a function that

progressively changes from 0 to 1 with no discontinuity. Mathematically, this

means that we need a continuous function that allows us to compute the derivative.

You might remember that in mathematics the derivative is the amount by which a

function changes at a given point. For functions with input given by real numbers,

the derivative is the slope of the tangent line at a point on a graph. Later in this

chapter, we will see why derivatives are important for learning, when we talk about

gradient descent.

Activation function – sigmoid

1

The sigmoid function defined as σσ(xx) =

1 + ee −xx and represented in the following

figure has small output changes in the range (0, 1) when the input varies in the range

(−∞, ∞) . Mathematically the function is continuous. A typical sigmoid function is

represented in Figure 6:

Figure 6: A sigmoid function with output in the range (0,1)

A neuron can use the sigmoid for computing the nonlinear function σσ(zz = wwww + bb) .

Note that if z = wx + b is very large and positive, then ee −zz → 0 so σσ(zz) → 1 , while if

z = wx + b is very large and negative ee −zz → ∞ so σσ(zz) → 0 . In other words, a neuron

with sigmoid activation has a behavior similar to the perceptron, but the changes are

gradual and output values such as 0.5539 or 0.123191 are perfectly legitimate. In this

sense, a sigmoid neuron can answer "maybe."

Activation function – tanh

Another useful activation function is tanh. Defined as tanh(zz) = ee zz − ee −zz

ee zz whose

− ee−zz shape is shown in Figure 7, its outputs range from -1 to 1:

[ 10 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!