Advanced Deep Learning with Keras
Chapter 1Layers Optimizer Regularizer Train Accuracy, % Test Accuracy, %256 SGD Dropout(0.2) 97.26 98.00256 RMSprop Dropout(0.2) 96.72 97.60256 Adam Dropout(0.2) 96.79 97.40512 SGD Dropout(0.2) 97.88 98.30Table 1.5.1: The different SimpleRNN network configurations and performance measuresIn many deep neural networks, other members of the RNN family are morecommonly used. For example, Long Short-Term Memory (LSTM) networks havebeen used in both machine translation and question answering problems. LSTMnetworks address the problem of long-term dependency or remembering relevantpast information to the present output.Unlike RNNs or SimpleRNN, the internal structure of the LSTM cell is morecomplex. Figure 1.5.4 shows a diagram of LSTM in the context of MNIST digitclassification. LSTM uses not only the present input and past outputs or hiddenstates; it introduces a cell state, s t, that carries information from one cell to theother. Information flow between cell states is controlled by three gates, f t, i tandq t. The three gates have the effect of determining which information should beretained or replaced and the amount of information in the past and current inputthat should contribute to the current cell state or output. We will not discuss thedetails of the internal structure of the LSTM cell in this book. However, an intuitiveguide to LSTM can be found at: http://colah.github.io/posts/2015-08-Understanding-LSTMs.The LSTM() layer can be used as a drop-in replacement to SimpleRNN(). If LSTMis overkill for the task at hand, a simpler version called Gated Recurrent Unit(GRU) can be used. GRU simplifies LSTM by combining the cell state and hiddenstate together. GRU also reduces the number of gates by one. The GRU() functioncan also be used as a drop-in replacement for SimpleRNN().[ 35 ]
Introducing Advanced Deep Learning with KerasFigure 1.5.4: Diagram of LSTM. The parameters are not shown for clarityThere are many other ways to configure RNNs. One way is making an RNNmodel that is bidirectional. By default, RNNs are unidirectional in the sense thatthe current output is only influenced by the past states and the current input.In bidirectional RNNs, future states can also influence the present state and thepast states by allowing information to flow backward. Past outputs are updatedas needed depending on the new information received. RNNs can be madebidirectional by calling a wrapper function. For example, the implementationof bidirectional LSTM is Bidirectional(LSTM()).For all types of RNNs, increasing the units will also increase the capacity. However,another way of increasing the capacity is by stacking the RNN layers. You shouldnote though that as a general rule of thumb, the capacity of the model should onlybe increased if needed. Excess capacity may contribute to overfitting, and as a result,both longer training time and slower performance during prediction.[ 36 ]
- Page 2 and 3: Advanced Deep Learningwith KerasApp
- Page 4 and 5: mapt.ioMapt is an online digital li
- Page 6 and 7: I would like to thank my family, Ch
- Page 8 and 9: Table of ContentsPrefaceVChapter 1:
- Page 10 and 11: [ iii ]Table of ContentsChapter 7:
- Page 12 and 13: [ v ]PrefaceIn recent years, deep l
- Page 14 and 15: Chapter 5, Improved GANs, covers al
- Page 16 and 17: def encoder_layer(inputs,filters=16
- Page 18 and 19: Introducing Advanced DeepLearning w
- Page 20 and 21: Chapter 1Installing Keras and Tenso
- Page 22 and 23: Chapter 1• RNNs: Recurrent neural
- Page 24 and 25: [ 7 ]Chapter 1In the preceding figu
- Page 26 and 27: Chapter 1Figure 1.3.3: MLP MNIST di
- Page 28 and 29: Chapter 1model.add(Activation('soft
- Page 30 and 31: Chapter 1model.add(Activation('relu
- Page 32 and 33: Chapter 1As an example, l2 weight r
- Page 34 and 35: [ 17 ]Chapter 1How far the predicte
- Page 36 and 37: Chapter 1Figure 1.3.8: Plot of a fu
- Page 38 and 39: Chapter 1The highest test accuracy
- Page 40 and 41: Chapter 1Figure 1.3.9: The graphica
- Page 42 and 43: Chapter 1# image is processed as is
- Page 44 and 45: Chapter 1The computation involved i
- Page 46 and 47: Chapter 1Listing 1.4.2 shows a summ
- Page 48 and 49: Chapter 164-64-64 RMSprop Dropout(0
- Page 50 and 51: Chapter 1There are the two main dif
- Page 54: ConclusionThis chapter provided an
- Page 57 and 58: Deep Neural NetworksWhile this chap
- Page 59 and 60: Deep Neural Networks# reshape and n
- Page 61 and 62: Deep Neural NetworksEverything else
- Page 63 and 64: Deep Neural Networksfrom keras.util
- Page 65 and 66: Deep Neural NetworksFigure 2.1.3: T
- Page 67 and 68: Deep Neural NetworksHence, the netw
- Page 69 and 70: Deep Neural NetworksGenerally speak
- Page 71 and 72: Deep Neural NetworksIn the dataset,
- Page 73 and 74: Deep Neural NetworksTransition Laye
- Page 75 and 76: Deep Neural NetworksThere are some
- Page 77 and 78: Deep Neural NetworksResNet v2 is al
- Page 79 and 80: Deep Neural Networks…if version =
- Page 81 and 82: Deep Neural NetworksTo prevent the
- Page 83 and 84: Deep Neural NetworksAverage Pooling
- Page 85 and 86: Deep Neural Networks# orig paper us
- Page 88 and 89: AutoencodersIn the previous chapter
- Page 90 and 91: Chapter 3The autoencoder has the te
- Page 92 and 93: Chapter 3Firstly, we're going to im
- Page 94 and 95: Chapter 3# reconstruct the inputout
- Page 96 and 97: Chapter 3Figure 3.2.2: The decoder
- Page 98 and 99: batch_size=32,model_name="autoencod
- Page 100 and 101: Chapter 3Figure 3.2.6: Digits gener
Chapter 1
Layers Optimizer Regularizer Train Accuracy, % Test Accuracy, %
256 SGD Dropout(0.2) 97.26 98.00
256 RMSprop Dropout(0.2) 96.72 97.60
256 Adam Dropout(0.2) 96.79 97.40
512 SGD Dropout(0.2) 97.88 98.30
Table 1.5.1: The different SimpleRNN network configurations and performance measures
In many deep neural networks, other members of the RNN family are more
commonly used. For example, Long Short-Term Memory (LSTM) networks have
been used in both machine translation and question answering problems. LSTM
networks address the problem of long-term dependency or remembering relevant
past information to the present output.
Unlike RNNs or SimpleRNN, the internal structure of the LSTM cell is more
complex. Figure 1.5.4 shows a diagram of LSTM in the context of MNIST digit
classification. LSTM uses not only the present input and past outputs or hidden
states; it introduces a cell state, s t
, that carries information from one cell to the
other. Information flow between cell states is controlled by three gates, f t
, i t
and
q t
. The three gates have the effect of determining which information should be
retained or replaced and the amount of information in the past and current input
that should contribute to the current cell state or output. We will not discuss the
details of the internal structure of the LSTM cell in this book. However, an intuitive
guide to LSTM can be found at: http://colah.github.io/posts/2015-08-
Understanding-LSTMs.
The LSTM() layer can be used as a drop-in replacement to SimpleRNN(). If LSTM
is overkill for the task at hand, a simpler version called Gated Recurrent Unit
(GRU) can be used. GRU simplifies LSTM by combining the cell state and hidden
state together. GRU also reduces the number of gates by one. The GRU() function
can also be used as a drop-in replacement for SimpleRNN().
[ 35 ]