pdfcoffee
In fact, it is possible to slide the submatrices by only 23 positions before touchingthe borders of the images. In Keras, the number of pixels along one edge of thekernel, or submatrix, is the kernel size; the stride length, however, is the numberof pixels by which the kernel is moved at each step in the convolution.Chapter 4Let's define the feature map from one layer to another. Of course, we can havemultiple feature maps that learn independently from each hidden layer. Forexample, we can start with 28×28 input neurons for processing MINST images,and then recall k feature maps of size 24×24 neurons each (again with stride of 5×5)in the next hidden layer.Shared weights and biasLet's suppose that we want to move away from the pixel representation in a rawimage, by gaining the ability to detect the same feature independently from thelocation where it is placed in the input image. A simple approach is to use the sameset of weights and biases for all the neurons in the hidden layers. In this way, eachlayer will learn a set of position-independent latent features derived from the image,bearing in mind that a layer consists of a set of kernels in parallel, and each kernelonly learns one feature.A mathematical exampleOne simple way to understand convolution is to think about a sliding windowfunction applied to a matrix. In the following example, given the input matrix I andthe kernel K, we get the convolved output. The 3×3 kernel K (sometimes called thefilter or feature detector) is multiplied elementwise with the input matrix to get onecell in the output matrix. All the other cells are obtained by sliding the window over I:Figure 2: Input matrix I and kernel K producing a Convolved output[ 111 ]
Convolutional Neural NetworksIn this example we decided to stop the sliding window as soon as we touch theborders of I (so the output is 3×3). Alternatively, we could have chosen to pad theinput with zeros (so that the output would have been 5×5). This decision relatesto the padding choice adopted. Note that kernel depth is equal to input depth(channel).Another choice is about how far along we slide our sliding windows with each step.This is called the stride. A larger stride generates less applications of the kernel anda smaller output size, while a smaller stride generates more output and retains moreinformation.The size of the filter, the stride, and the type of padding are hyperparameters thatcan be fine-tuned during the training of the network.ConvNets in TensorFlow 2.xIn TensorFlow 2.x if we want to add a convolutional layer with 32 parallel featuresand a filter size of 3×3, we write:import tensorflow as tffrom tensorflow.keras import datasets, layers, modelsmodel = models.Sequential()model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))This means that we are applying a 3×3 convolution on 28×28 images with one inputchannel (or input filters) resulting in 32 output channels (or output filters).An example of convolution is provided in Figure 3:Figure 3: An example of convolution[ 112 ]
- Page 96 and 97: Chapter 2Both PyTorch and TensorFlo
- Page 98 and 99: Chapter 2state = [tf.zeros([100, 10
- Page 100 and 101: Chapter 2For now, there's no need t
- Page 102 and 103: Chapter 2Let's see an example of a
- Page 104 and 105: Chapter 2If you want to save a mode
- Page 106 and 107: Chapter 2supervised=True)train_data
- Page 108 and 109: Chapter 2There, tf.feature_column.n
- Page 110 and 111: Chapter 2print (dz_dx)print (dy_dx)
- Page 112 and 113: Chapter 2In our toy example we use
- Page 114 and 115: Chapter 2For multi-machine training
- Page 116 and 117: Chapter 25. Use tf.layers modules t
- Page 118 and 119: Chapter 2Keras or tf.keras?Another
- Page 120: • tf.data can be used to load mod
- Page 123 and 124: RegressionLet us imagine a simpler
- Page 125 and 126: RegressionTake a look at the last t
- Page 127 and 128: Regression3. Now, we calculate the
- Page 129 and 130: RegressionIn the next section we wi
- Page 131 and 132: Regression2. Now, we define the fea
- Page 133 and 134: Regression2. Download the dataset:(
- Page 135 and 136: RegressionThe following is the Tens
- Page 137 and 138: RegressionIn regression the aim is
- Page 139 and 140: RegressionThe Estimator outputs the
- Page 141 and 142: RegressionThe following is the grap
- Page 143 and 144: RegressionReferencesHere are some g
- Page 145: Convolutional Neural NetworksIn thi
- Page 149 and 150: Convolutional Neural NetworksIn oth
- Page 151 and 152: Convolutional Neural NetworksThen w
- Page 153 and 154: Convolutional Neural NetworksHoweve
- Page 155 and 156: Convolutional Neural NetworksPlotti
- Page 157 and 158: Convolutional Neural NetworksIn gen
- Page 159 and 160: Convolutional Neural NetworksOur ne
- Page 161 and 162: Convolutional Neural NetworksThese
- Page 163 and 164: Convolutional Neural NetworksSo, we
- Page 165 and 166: Convolutional Neural NetworksEach i
- Page 167 and 168: Convolutional Neural NetworksVery d
- Page 169 and 170: Convolutional Neural NetworksRecogn
- Page 171 and 172: Convolutional Neural NetworksIf we
- Page 173 and 174: Convolutional Neural NetworksRefere
- Page 175 and 176: Advanced Convolutional Neural Netwo
- Page 177 and 178: Advanced Convolutional Neural Netwo
- Page 179 and 180: Advanced Convolutional Neural Netwo
- Page 181 and 182: Advanced Convolutional Neural Netwo
- Page 183 and 184: Advanced Convolutional Neural Netwo
- Page 185 and 186: Advanced Convolutional Neural Netwo
- Page 187 and 188: Advanced Convolutional Neural Netwo
- Page 189 and 190: Advanced Convolutional Neural Netwo
- Page 191 and 192: Advanced Convolutional Neural Netwo
- Page 193 and 194: Advanced Convolutional Neural Netwo
- Page 195 and 196: Advanced Convolutional Neural Netwo
In fact, it is possible to slide the submatrices by only 23 positions before touching
the borders of the images. In Keras, the number of pixels along one edge of the
kernel, or submatrix, is the kernel size; the stride length, however, is the number
of pixels by which the kernel is moved at each step in the convolution.
Chapter 4
Let's define the feature map from one layer to another. Of course, we can have
multiple feature maps that learn independently from each hidden layer. For
example, we can start with 28×28 input neurons for processing MINST images,
and then recall k feature maps of size 24×24 neurons each (again with stride of 5×5)
in the next hidden layer.
Shared weights and bias
Let's suppose that we want to move away from the pixel representation in a raw
image, by gaining the ability to detect the same feature independently from the
location where it is placed in the input image. A simple approach is to use the same
set of weights and biases for all the neurons in the hidden layers. In this way, each
layer will learn a set of position-independent latent features derived from the image,
bearing in mind that a layer consists of a set of kernels in parallel, and each kernel
only learns one feature.
A mathematical example
One simple way to understand convolution is to think about a sliding window
function applied to a matrix. In the following example, given the input matrix I and
the kernel K, we get the convolved output. The 3×3 kernel K (sometimes called the
filter or feature detector) is multiplied elementwise with the input matrix to get one
cell in the output matrix. All the other cells are obtained by sliding the window over I:
Figure 2: Input matrix I and kernel K producing a Convolved output
[ 111 ]