pdfcoffee

soumyasankar99
from soumyasankar99 More from this publisher
09.05.2023 Views

Chapter 5Depthwise convolutionLet's consider an image with multiple channels. In the normal 2D convolution,the filter is as deep as the input and it allows us to mix channels for generating eachelement of the output. In Depthwise convolution, each channel is kept separate, thefilter is split into channels, each convolution is applied separately, and the results arestacked back together into one tensor.Depthwise separable convolutionThis convolution should not be confused with the separable convolution. Aftercompleting the Depthwise convolution, an additional step is performed: a 1×1convolution across channels. Depthwise separable convolutions are used inXception. They are also used in MobileNet, a model particularly useful for mobileand embedded vision applications because of its reduced model size and complexity.In this section, we have discussed all the major forms of convolution. The nextsection will discuss Capsule networks a new form of learning introduced in 2017.Capsule networksCapsule Networks (CapsNets) are a very recent and innovative type of deeplearning network. This technique was introduced at the end of October 2017 in aseminal paper titled Dynamic Routing Between Capsules by Sara Sabour, NicholasFrost, and Geoffrey Hinton (https://arxiv.org/abs/1710.09829) [14]. Hintonis the father of Deep Learning and, therefore, the whole Deep Learning communityis excited to see the progress made with Capsules. Indeed, CapsNets are alreadybeating the best CNN on MNIST classification, which is ... well, impressive!!So what is the problem with CNNs?In CNNs each layer "understands" an image at a progressive level of granularity.As we discussed in multiple examples, the first layer will most likely recognize straightlines or simple curves and edges, while subsequent layers will start to understandmore complex shapes such as rectangles up to complex forms such as human faces.Now, one critical operation used for CNNs is pooling. Pooling aims at creatingthe positional invariance and it is used after each CNN layer to make any problemcomputationally tractable. However, pooling introduces a significant problembecause it forces us to lose all the positional data. This is not good. Think about aface: it consists of two eyes, a mouth, and a nose and what is important is that thereis a spatial relationship between these parts (for example, the mouth is below thenose, which is typically below the eyes).[ 185 ]

Advanced Convolutional Neural NetworksIndeed, Hinton said: "The pooling operation used in convolutional neural networksis a big mistake and the fact that it works so well is a disaster." Technically we do notneed positional invariance but instead we need equivariance. Equivariance is a fancyterm for indicating that we want to understand the rotation or proportion changein an image, and we want to adapt the network accordingly. In this way, the spatialpositioning among the different components in an image is not lost.So what is new with Capsule networks?According to the authors, our brain has modules called "capsules," and each capsuleis specialized in handling a particular type of information. In particular, there arecapsules that work well for "understanding" the concept of position, the concept ofsize, the concept of orientation, the concept of deformation, the concept of textures,and so on and so forth. In addition to that, the authors suggest that our brain hasparticularly efficient mechanisms for dynamically routing each piece of informationto the capsule that is considered best suited for handling a particular type ofinformation.So, the main difference between CNN and CapsNets is that with a CNN you keepadding layers for creating a deep network, while with CapsNet you nest a neurallayer inside another. A capsule is a group of neurons that introduces more structurein the net and it produces a vector to signal the existence of an entity in the image. Inparticular, Hinton uses the length of the activity vector to represent the probabilitythat the entity exists and its orientation to represent the instantiation parameters.When multiple predictions agree, a higher-level capsule becomes active. For eachpossible parent, the capsule produces an additional prediction vector.Now a second innovation comes into place: we will use dynamic routing acrosscapsules and will no longer use the raw idea of pooling. A lower-level capsuleprefers to send its output to higher-level capsules for which the activity vectors havea big scalar product with the prediction coming from the lower-level capsule. Theparent with the largest scalar prediction vector product increases the capsule bond.All the other parents decrease their bond. In other words, the idea is that if a higherlevelcapsule agrees with a lower level one, then it will ask to send more informationof that type. If there is no agreement, it will ask to send less of them. This dynamicrouting by the agreement method is superior to the current mechanism like maxpoolingand, according to Hinton, routing is ultimately a way to parse the image.Indeed, max-pooling ignores anything but the largest value, while dynamic routingselectively propagates information according to the agreement between lower layersand upper layers.[ 186 ]

Advanced Convolutional Neural Networks

Indeed, Hinton said: "The pooling operation used in convolutional neural networks

is a big mistake and the fact that it works so well is a disaster." Technically we do not

need positional invariance but instead we need equivariance. Equivariance is a fancy

term for indicating that we want to understand the rotation or proportion change

in an image, and we want to adapt the network accordingly. In this way, the spatial

positioning among the different components in an image is not lost.

So what is new with Capsule networks?

According to the authors, our brain has modules called "capsules," and each capsule

is specialized in handling a particular type of information. In particular, there are

capsules that work well for "understanding" the concept of position, the concept of

size, the concept of orientation, the concept of deformation, the concept of textures,

and so on and so forth. In addition to that, the authors suggest that our brain has

particularly efficient mechanisms for dynamically routing each piece of information

to the capsule that is considered best suited for handling a particular type of

information.

So, the main difference between CNN and CapsNets is that with a CNN you keep

adding layers for creating a deep network, while with CapsNet you nest a neural

layer inside another. A capsule is a group of neurons that introduces more structure

in the net and it produces a vector to signal the existence of an entity in the image. In

particular, Hinton uses the length of the activity vector to represent the probability

that the entity exists and its orientation to represent the instantiation parameters.

When multiple predictions agree, a higher-level capsule becomes active. For each

possible parent, the capsule produces an additional prediction vector.

Now a second innovation comes into place: we will use dynamic routing across

capsules and will no longer use the raw idea of pooling. A lower-level capsule

prefers to send its output to higher-level capsules for which the activity vectors have

a big scalar product with the prediction coming from the lower-level capsule. The

parent with the largest scalar prediction vector product increases the capsule bond.

All the other parents decrease their bond. In other words, the idea is that if a higherlevel

capsule agrees with a lower level one, then it will ask to send more information

of that type. If there is no agreement, it will ask to send less of them. This dynamic

routing by the agreement method is superior to the current mechanism like maxpooling

and, according to Hinton, routing is ultimately a way to parse the image.

Indeed, max-pooling ignores anything but the largest value, while dynamic routing

selectively propagates information according to the agreement between lower layers

and upper layers.

[ 186 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!