pdfcoffee
Chapter 1In Figure 4 each node in the first hidden layer receives an input and "fires" (0,1)according to the values of the associated linear function. Then, the output of the firsthidden layer is passed to the second layer where another linear function is applied,the results of which are passed to the final output layer consisting of one singleneuron. It is interesting to note that this layered organization vaguely resemblesthe organization of the human vision system, as we discussed earlier.Problems in training the perceptron and theirsolutionsLet's consider a single neuron; what are the best choices for the weight w and the biasb? Ideally, we would like to provide a set of training examples and let the computeradjust the weight and the bias in such a way that the errors produced in the outputare minimized.In order to make this a bit more concrete, let's suppose that we have a set of imagesof cats and another separate set of images not containing cats. Suppose that eachneuron receives input from the value of a single pixel in the images. While thecomputer processes those images, we would like our neuron to adjust its weightsand its bias so that we have fewer and fewer images wrongly recognized.This approach seems very intuitive, but it requires a small change in the weights (orthe bias) to cause only a small change in the outputs. Think about it: if we have abig output jump, we cannot learn progressively. After all, kids learn little by little.Unfortunately, the perceptron does not show this "little-by-little" behavior. Aperceptron is either a 0 or 1, and that's a big jump that will not help in learning (seeFigure 5):Figure 5: Example of perceptron - either a 0 or 1[ 9 ]
Neural Network Foundations with TensorFlow 2.0We need something different; something smoother. We need a function thatprogressively changes from 0 to 1 with no discontinuity. Mathematically, thismeans that we need a continuous function that allows us to compute the derivative.You might remember that in mathematics the derivative is the amount by which afunction changes at a given point. For functions with input given by real numbers,the derivative is the slope of the tangent line at a point on a graph. Later in thischapter, we will see why derivatives are important for learning, when we talk aboutgradient descent.Activation function – sigmoid1The sigmoid function defined as σσ(xx) =1 + ee −xx and represented in the followingfigure has small output changes in the range (0, 1) when the input varies in the range(−∞, ∞) . Mathematically the function is continuous. A typical sigmoid function isrepresented in Figure 6:Figure 6: A sigmoid function with output in the range (0,1)A neuron can use the sigmoid for computing the nonlinear function σσ(zz = wwww + bb) .Note that if z = wx + b is very large and positive, then ee −zz → 0 so σσ(zz) → 1 , while ifz = wx + b is very large and negative ee −zz → ∞ so σσ(zz) → 0 . In other words, a neuronwith sigmoid activation has a behavior similar to the perceptron, but the changes aregradual and output values such as 0.5539 or 0.123191 are perfectly legitimate. In thissense, a sigmoid neuron can answer "maybe."Activation function – tanhAnother useful activation function is tanh. Defined as tanh(zz) = ee zz − ee −zzee zz whose− ee−zz shape is shown in Figure 7, its outputs range from -1 to 1:[ 10 ]
- Page 2 and 3: Deep Learning withTensorFlow 2 and
- Page 4 and 5: packt.comSubscribe to our online di
- Page 6 and 7: I want to thank my kids, Aurora, Le
- Page 8 and 9: Sujit Pal is a Technology Research
- Page 10 and 11: Table of ContentsPrefacexiChapter 1
- Page 12 and 13: [ iii ]Table of ContentsConverting
- Page 14 and 15: Table of ContentsSo what is the pro
- Page 16 and 17: [ vii ]Table of ContentsChapter 10:
- Page 18 and 19: Table of ContentsPretrained models
- Page 20 and 21: PrefaceDeep Learning with TensorFlo
- Page 22 and 23: • Supervised learning, in which t
- Page 24 and 25: PrefaceThe complexity of deep learn
- Page 26 and 27: PrefaceFigure 5: Adoption of deep l
- Page 28 and 29: Chapter 1, Neural Network Foundatio
- Page 30 and 31: PrefaceChapter 13, TensorFlow for M
- Page 32 and 33: ConventionsThere are a number of te
- Page 34: PrefaceReferences1. Deep Learning w
- Page 37 and 38: Neural Network Foundations with Ten
- Page 39 and 40: Neural Network Foundations with Ten
- Page 41 and 42: Neural Network Foundations with Ten
- Page 43: Neural Network Foundations with Ten
- Page 47 and 48: Neural Network Foundations with Ten
- Page 49 and 50: Neural Network Foundations with Ten
- Page 51 and 52: Neural Network Foundations with Ten
- Page 53 and 54: Neural Network Foundations with Ten
- Page 55 and 56: Neural Network Foundations with Ten
- Page 57 and 58: Neural Network Foundations with Ten
- Page 59 and 60: Neural Network Foundations with Ten
- Page 61 and 62: Neural Network Foundations with Ten
- Page 63 and 64: Neural Network Foundations with Ten
- Page 65 and 66: Neural Network Foundations with Ten
- Page 67 and 68: Neural Network Foundations with Ten
- Page 69 and 70: Neural Network Foundations with Ten
- Page 71 and 72: Neural Network Foundations with Ten
- Page 73 and 74: Neural Network Foundations with Ten
- Page 75 and 76: Neural Network Foundations with Ten
- Page 77 and 78: Neural Network Foundations with Ten
- Page 79 and 80: Neural Network Foundations with Ten
- Page 81 and 82: Neural Network Foundations with Ten
- Page 83 and 84: Neural Network Foundations with Ten
- Page 86 and 87: TensorFlow 1.x and 2.xThe intent of
- Page 88 and 89: An example to start withWe'll consi
- Page 90 and 91: Chapter 23. Placeholders: Placehold
- Page 92 and 93: • To create random values from a
Neural Network Foundations with TensorFlow 2.0
We need something different; something smoother. We need a function that
progressively changes from 0 to 1 with no discontinuity. Mathematically, this
means that we need a continuous function that allows us to compute the derivative.
You might remember that in mathematics the derivative is the amount by which a
function changes at a given point. For functions with input given by real numbers,
the derivative is the slope of the tangent line at a point on a graph. Later in this
chapter, we will see why derivatives are important for learning, when we talk about
gradient descent.
Activation function – sigmoid
1
The sigmoid function defined as σσ(xx) =
1 + ee −xx and represented in the following
figure has small output changes in the range (0, 1) when the input varies in the range
(−∞, ∞) . Mathematically the function is continuous. A typical sigmoid function is
represented in Figure 6:
Figure 6: A sigmoid function with output in the range (0,1)
A neuron can use the sigmoid for computing the nonlinear function σσ(zz = wwww + bb) .
Note that if z = wx + b is very large and positive, then ee −zz → 0 so σσ(zz) → 1 , while if
z = wx + b is very large and negative ee −zz → ∞ so σσ(zz) → 0 . In other words, a neuron
with sigmoid activation has a behavior similar to the perceptron, but the changes are
gradual and output values such as 0.5539 or 0.123191 are perfectly legitimate. In this
sense, a sigmoid neuron can answer "maybe."
Activation function – tanh
Another useful activation function is tanh. Defined as tanh(zz) = ee zz − ee −zz
ee zz whose
− ee−zz shape is shown in Figure 7, its outputs range from -1 to 1:
[ 10 ]