pdfcoffee

soumyasankar99
from soumyasankar99 More from this publisher
09.05.2023 Views

Chapter 5A third difference is that a new nonlinear activation function has been introduced.Instead of adding a squashing function to each layer as in CNNs CapsNets add asquashing function to a nested set of layers. The nonlinear activation function isrepresented in the following diagram and it is called a squashing function (equation 1):vv jj = ||SS jj|| 21 + ||SS jj || 2 SS jj||SS jj ||(1)Where V jis the vector output of capsule j and S jis its total input.Moreover, Hinton and others show that a discriminatively trained, multilayercapsule system achieves state-of-the-art performances on MNIST and is considerablybetter than a convolutional net at recognizing highly overlapping digits.From the paper Dynamic Routing Between Capsules we report a simple CapsNetarchitecture:Figure 32: Visualizing the CapsNet architectureThe architecture is shallow with only two convolutional layers and one fullyconnected layer. Conv1 has 256 9 × 9 convolution kernels with a stride of 1 andReLU activation. The role of this layer is to convert pixel intensities to the activitiesof local feature detectors that are then used as inputs to the primary capsules.PrimaryCapsules is a convolutional capsule layer with 32 channels: each primarycapsule contains 8 convolutional units with a 9 × 9 kernel and a stride of 2. In totalPrimaryCapsules has [32, 6, 6] capsule outputs (each output is an 8D vector) andeach capsule in the [6, 6] grid is sharing its weights with each other. The final layer(DigitCaps) has one 16D capsule per digit class and each one of these capsulesreceives an input from all the other capsules in the layer below. Routing happensonly between two consecutive capsule layers (for example, PrimaryCapsules andDigitCaps).[ 187 ]

Advanced Convolutional Neural NetworksSummaryIn this chapter we have seen many applications of CNNs across very differentdomains, from traditional image processing and computer vision, to close-enoughvideo processing, to not-so-close audio processing and text processing. In a relativelyfew number of years, CNNs took machine learning by storm.Nowadays it is not uncommon to see multimodal processing, where text, images,audio, and videos are considered together to achieve better performance, frequentlyby means of CNNs together with a bunch of other techniques such as RNNs andreinforcement learning. Of course, there is much more to consider, and CNNs haverecently been applied to many other domains such as Genetic inference [13], whichare, at least at first glance, far away from the original scope of their design.In this chapter, we have discussed all the major variants of ConvNets. In the nextchapter, we will introduce Generative Nets: one of the most innovative deep learningarchitectures yet.References1. J. Yosinski and Y. B. J Clune, How transferable are features in deep neuralnetworks?, in Advances in Neural Information Processing Systems 27, pp. 3320–3328.2. C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, Rethinkingthe Inception Architecture for Computer Vision, in 2016 IEEE Conference onComputer Vision and Pattern Recognition (CVPR), pp. 2818–2826.3. M. Sandler, A. Howard, M. Zhu, A. Zhmonginov, L. C. Chen, MobileNetV2:Inverted Residuals and Linear Bottlenecks (2019), Google Inc.4. A Krizhevsky, I Sutskever, GE Hinton, ImageNet Classification with DeepConvolutional Neural Networks, 20125. Gao Huang, Zhuang Liu, Laurens van der Maaten, Kilian Q. Weinberger,Densely Connected Convolutional Networks, 28 Jan 2018 http://arxiv.org/abs/1608.06993.6. François Chollet, Xception: Deep Learning with Depthwise SeparableConvolutions, 2017, https://arxiv.org/abs/1610.02357.7. Leon A. Gatys, Alexander S. Ecker, Matthias Bethge, A Neural Algorithmof Artistic Style, 2016, https://arxiv.org/abs/1508.06576.8. Mordvintsev, Alexander; Olah, Christopher; Tyka, Mike. DeepDream -a code example for visualizing Neural Networks. Google Research, 2015.[ 188 ]

Chapter 5

A third difference is that a new nonlinear activation function has been introduced.

Instead of adding a squashing function to each layer as in CNNs CapsNets add a

squashing function to a nested set of layers. The nonlinear activation function is

represented in the following diagram and it is called a squashing function (equation 1):

vv jj = ||SS jj|| 2

1 + ||SS jj || 2 SS jj

||SS jj ||

(1)

Where V j

is the vector output of capsule j and S j

is its total input.

Moreover, Hinton and others show that a discriminatively trained, multilayer

capsule system achieves state-of-the-art performances on MNIST and is considerably

better than a convolutional net at recognizing highly overlapping digits.

From the paper Dynamic Routing Between Capsules we report a simple CapsNet

architecture:

Figure 32: Visualizing the CapsNet architecture

The architecture is shallow with only two convolutional layers and one fully

connected layer. Conv1 has 256 9 × 9 convolution kernels with a stride of 1 and

ReLU activation. The role of this layer is to convert pixel intensities to the activities

of local feature detectors that are then used as inputs to the primary capsules.

PrimaryCapsules is a convolutional capsule layer with 32 channels: each primary

capsule contains 8 convolutional units with a 9 × 9 kernel and a stride of 2. In total

PrimaryCapsules has [32, 6, 6] capsule outputs (each output is an 8D vector) and

each capsule in the [6, 6] grid is sharing its weights with each other. The final layer

(DigitCaps) has one 16D capsule per digit class and each one of these capsules

receives an input from all the other capsules in the layer below. Routing happens

only between two consecutive capsule layers (for example, PrimaryCapsules and

DigitCaps).

[ 187 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!