pdfcoffee
Chapter 5Depthwise convolutionLet's consider an image with multiple channels. In the normal 2D convolution,the filter is as deep as the input and it allows us to mix channels for generating eachelement of the output. In Depthwise convolution, each channel is kept separate, thefilter is split into channels, each convolution is applied separately, and the results arestacked back together into one tensor.Depthwise separable convolutionThis convolution should not be confused with the separable convolution. Aftercompleting the Depthwise convolution, an additional step is performed: a 1×1convolution across channels. Depthwise separable convolutions are used inXception. They are also used in MobileNet, a model particularly useful for mobileand embedded vision applications because of its reduced model size and complexity.In this section, we have discussed all the major forms of convolution. The nextsection will discuss Capsule networks a new form of learning introduced in 2017.Capsule networksCapsule Networks (CapsNets) are a very recent and innovative type of deeplearning network. This technique was introduced at the end of October 2017 in aseminal paper titled Dynamic Routing Between Capsules by Sara Sabour, NicholasFrost, and Geoffrey Hinton (https://arxiv.org/abs/1710.09829) [14]. Hintonis the father of Deep Learning and, therefore, the whole Deep Learning communityis excited to see the progress made with Capsules. Indeed, CapsNets are alreadybeating the best CNN on MNIST classification, which is ... well, impressive!!So what is the problem with CNNs?In CNNs each layer "understands" an image at a progressive level of granularity.As we discussed in multiple examples, the first layer will most likely recognize straightlines or simple curves and edges, while subsequent layers will start to understandmore complex shapes such as rectangles up to complex forms such as human faces.Now, one critical operation used for CNNs is pooling. Pooling aims at creatingthe positional invariance and it is used after each CNN layer to make any problemcomputationally tractable. However, pooling introduces a significant problembecause it forces us to lose all the positional data. This is not good. Think about aface: it consists of two eyes, a mouth, and a nose and what is important is that thereis a spatial relationship between these parts (for example, the mouth is below thenose, which is typically below the eyes).[ 185 ]
Advanced Convolutional Neural NetworksIndeed, Hinton said: "The pooling operation used in convolutional neural networksis a big mistake and the fact that it works so well is a disaster." Technically we do notneed positional invariance but instead we need equivariance. Equivariance is a fancyterm for indicating that we want to understand the rotation or proportion changein an image, and we want to adapt the network accordingly. In this way, the spatialpositioning among the different components in an image is not lost.So what is new with Capsule networks?According to the authors, our brain has modules called "capsules," and each capsuleis specialized in handling a particular type of information. In particular, there arecapsules that work well for "understanding" the concept of position, the concept ofsize, the concept of orientation, the concept of deformation, the concept of textures,and so on and so forth. In addition to that, the authors suggest that our brain hasparticularly efficient mechanisms for dynamically routing each piece of informationto the capsule that is considered best suited for handling a particular type ofinformation.So, the main difference between CNN and CapsNets is that with a CNN you keepadding layers for creating a deep network, while with CapsNet you nest a neurallayer inside another. A capsule is a group of neurons that introduces more structurein the net and it produces a vector to signal the existence of an entity in the image. Inparticular, Hinton uses the length of the activity vector to represent the probabilitythat the entity exists and its orientation to represent the instantiation parameters.When multiple predictions agree, a higher-level capsule becomes active. For eachpossible parent, the capsule produces an additional prediction vector.Now a second innovation comes into place: we will use dynamic routing acrosscapsules and will no longer use the raw idea of pooling. A lower-level capsuleprefers to send its output to higher-level capsules for which the activity vectors havea big scalar product with the prediction coming from the lower-level capsule. Theparent with the largest scalar prediction vector product increases the capsule bond.All the other parents decrease their bond. In other words, the idea is that if a higherlevelcapsule agrees with a lower level one, then it will ask to send more informationof that type. If there is no agreement, it will ask to send less of them. This dynamicrouting by the agreement method is superior to the current mechanism like maxpoolingand, according to Hinton, routing is ultimately a way to parse the image.Indeed, max-pooling ignores anything but the largest value, while dynamic routingselectively propagates information according to the agreement between lower layersand upper layers.[ 186 ]
- Page 169 and 170: Convolutional Neural NetworksRecogn
- Page 171 and 172: Convolutional Neural NetworksIf we
- Page 173 and 174: Convolutional Neural NetworksRefere
- Page 175 and 176: Advanced Convolutional Neural Netwo
- Page 177 and 178: Advanced Convolutional Neural Netwo
- Page 179 and 180: Advanced Convolutional Neural Netwo
- Page 181 and 182: Advanced Convolutional Neural Netwo
- Page 183 and 184: Advanced Convolutional Neural Netwo
- Page 185 and 186: Advanced Convolutional Neural Netwo
- Page 187 and 188: Advanced Convolutional Neural Netwo
- Page 189 and 190: Advanced Convolutional Neural Netwo
- Page 191 and 192: Advanced Convolutional Neural Netwo
- Page 193 and 194: Advanced Convolutional Neural Netwo
- Page 195 and 196: Advanced Convolutional Neural Netwo
- Page 197 and 198: Advanced Convolutional Neural Netwo
- Page 199 and 200: Advanced Convolutional Neural Netwo
- Page 201 and 202: Advanced Convolutional Neural Netwo
- Page 203 and 204: Advanced Convolutional Neural Netwo
- Page 205 and 206: Advanced Convolutional Neural Netwo
- Page 207 and 208: Advanced Convolutional Neural Netwo
- Page 209 and 210: Advanced Convolutional Neural Netwo
- Page 211 and 212: Advanced Convolutional Neural Netwo
- Page 213 and 214: Advanced Convolutional Neural Netwo
- Page 215 and 216: Advanced Convolutional Neural Netwo
- Page 217 and 218: Advanced Convolutional Neural Netwo
- Page 219: Advanced Convolutional Neural Netwo
- Page 223 and 224: Advanced Convolutional Neural Netwo
- Page 226 and 227: GenerativeAdversarial NetworksIn th
- Page 228 and 229: [ 193 ]Chapter 6Eventually, we reac
- Page 230 and 231: [ 195 ]Chapter 6Next, we combine th
- Page 232 and 233: Chapter 6And handwritten digits gen
- Page 234 and 235: Chapter 6Figure 1: Visualizing the
- Page 236 and 237: Chapter 6The resultant generator mo
- Page 238 and 239: Chapter 6Figure 4: A summary of res
- Page 240 and 241: Chapter 6def train(self, epochs, ba
- Page 242 and 243: Chapter 6The preceding images were
- Page 244 and 245: Chapter 6Another interesting paper
- Page 246 and 247: Chapter 6To elaborate, let us say t
- Page 248 and 249: Chapter 6Figure 7: The architecture
- Page 250 and 251: Chapter 6Figure 11: Illegible initi
- Page 252 and 253: Chapter 6Bedrooms: Generated bedroo
- Page 254 and 255: Chapter 6The images need to be norm
- Page 256 and 257: Chapter 6initializer = tf.random_no
- Page 258 and 259: Cool, right? Now we can define the
- Page 260 and 261: Chapter 6d_loss = (dA_loss + dB_los
- Page 262 and 263: Chapter 6generator_AB.save_weights(
- Page 264: 6. Ledig, Christian, et al. Photo-R
- Page 267 and 268: Word EmbeddingsDeep learning models
- Page 269 and 270: Word EmbeddingsFor example, "crucia
Advanced Convolutional Neural Networks
Indeed, Hinton said: "The pooling operation used in convolutional neural networks
is a big mistake and the fact that it works so well is a disaster." Technically we do not
need positional invariance but instead we need equivariance. Equivariance is a fancy
term for indicating that we want to understand the rotation or proportion change
in an image, and we want to adapt the network accordingly. In this way, the spatial
positioning among the different components in an image is not lost.
So what is new with Capsule networks?
According to the authors, our brain has modules called "capsules," and each capsule
is specialized in handling a particular type of information. In particular, there are
capsules that work well for "understanding" the concept of position, the concept of
size, the concept of orientation, the concept of deformation, the concept of textures,
and so on and so forth. In addition to that, the authors suggest that our brain has
particularly efficient mechanisms for dynamically routing each piece of information
to the capsule that is considered best suited for handling a particular type of
information.
So, the main difference between CNN and CapsNets is that with a CNN you keep
adding layers for creating a deep network, while with CapsNet you nest a neural
layer inside another. A capsule is a group of neurons that introduces more structure
in the net and it produces a vector to signal the existence of an entity in the image. In
particular, Hinton uses the length of the activity vector to represent the probability
that the entity exists and its orientation to represent the instantiation parameters.
When multiple predictions agree, a higher-level capsule becomes active. For each
possible parent, the capsule produces an additional prediction vector.
Now a second innovation comes into place: we will use dynamic routing across
capsules and will no longer use the raw idea of pooling. A lower-level capsule
prefers to send its output to higher-level capsules for which the activity vectors have
a big scalar product with the prediction coming from the lower-level capsule. The
parent with the largest scalar prediction vector product increases the capsule bond.
All the other parents decrease their bond. In other words, the idea is that if a higherlevel
capsule agrees with a lower level one, then it will ask to send more information
of that type. If there is no agreement, it will ask to send less of them. This dynamic
routing by the agreement method is superior to the current mechanism like maxpooling
and, according to Hinton, routing is ultimately a way to parse the image.
Indeed, max-pooling ignores anything but the largest value, while dynamic routing
selectively propagates information according to the agreement between lower layers
and upper layers.
[ 186 ]