Advanced Deep Learning with Keras

fourpersent2020
from fourpersent2020 More from this publisher
16.03.2021 Views

Chapter 1Layers Optimizer Regularizer Train Accuracy, % Test Accuracy, %256 SGD Dropout(0.2) 97.26 98.00256 RMSprop Dropout(0.2) 96.72 97.60256 Adam Dropout(0.2) 96.79 97.40512 SGD Dropout(0.2) 97.88 98.30Table 1.5.1: The different SimpleRNN network configurations and performance measuresIn many deep neural networks, other members of the RNN family are morecommonly used. For example, Long Short-Term Memory (LSTM) networks havebeen used in both machine translation and question answering problems. LSTMnetworks address the problem of long-term dependency or remembering relevantpast information to the present output.Unlike RNNs or SimpleRNN, the internal structure of the LSTM cell is morecomplex. Figure 1.5.4 shows a diagram of LSTM in the context of MNIST digitclassification. LSTM uses not only the present input and past outputs or hiddenstates; it introduces a cell state, s t, that carries information from one cell to theother. Information flow between cell states is controlled by three gates, f t, i tandq t. The three gates have the effect of determining which information should beretained or replaced and the amount of information in the past and current inputthat should contribute to the current cell state or output. We will not discuss thedetails of the internal structure of the LSTM cell in this book. However, an intuitiveguide to LSTM can be found at: http://colah.github.io/posts/2015-08-Understanding-LSTMs.The LSTM() layer can be used as a drop-in replacement to SimpleRNN(). If LSTMis overkill for the task at hand, a simpler version called Gated Recurrent Unit(GRU) can be used. GRU simplifies LSTM by combining the cell state and hiddenstate together. GRU also reduces the number of gates by one. The GRU() functioncan also be used as a drop-in replacement for SimpleRNN().[ 35 ]

Introducing Advanced Deep Learning with KerasFigure 1.5.4: Diagram of LSTM. The parameters are not shown for clarityThere are many other ways to configure RNNs. One way is making an RNNmodel that is bidirectional. By default, RNNs are unidirectional in the sense thatthe current output is only influenced by the past states and the current input.In bidirectional RNNs, future states can also influence the present state and thepast states by allowing information to flow backward. Past outputs are updatedas needed depending on the new information received. RNNs can be madebidirectional by calling a wrapper function. For example, the implementationof bidirectional LSTM is Bidirectional(LSTM()).For all types of RNNs, increasing the units will also increase the capacity. However,another way of increasing the capacity is by stacking the RNN layers. You shouldnote though that as a general rule of thumb, the capacity of the model should onlybe increased if needed. Excess capacity may contribute to overfitting, and as a result,both longer training time and slower performance during prediction.[ 36 ]

Chapter 1

Layers Optimizer Regularizer Train Accuracy, % Test Accuracy, %

256 SGD Dropout(0.2) 97.26 98.00

256 RMSprop Dropout(0.2) 96.72 97.60

256 Adam Dropout(0.2) 96.79 97.40

512 SGD Dropout(0.2) 97.88 98.30

Table 1.5.1: The different SimpleRNN network configurations and performance measures

In many deep neural networks, other members of the RNN family are more

commonly used. For example, Long Short-Term Memory (LSTM) networks have

been used in both machine translation and question answering problems. LSTM

networks address the problem of long-term dependency or remembering relevant

past information to the present output.

Unlike RNNs or SimpleRNN, the internal structure of the LSTM cell is more

complex. Figure 1.5.4 shows a diagram of LSTM in the context of MNIST digit

classification. LSTM uses not only the present input and past outputs or hidden

states; it introduces a cell state, s t

, that carries information from one cell to the

other. Information flow between cell states is controlled by three gates, f t

, i t

and

q t

. The three gates have the effect of determining which information should be

retained or replaced and the amount of information in the past and current input

that should contribute to the current cell state or output. We will not discuss the

details of the internal structure of the LSTM cell in this book. However, an intuitive

guide to LSTM can be found at: http://colah.github.io/posts/2015-08-

Understanding-LSTMs.

The LSTM() layer can be used as a drop-in replacement to SimpleRNN(). If LSTM

is overkill for the task at hand, a simpler version called Gated Recurrent Unit

(GRU) can be used. GRU simplifies LSTM by combining the cell state and hidden

state together. GRU also reduces the number of gates by one. The GRU() function

can also be used as a drop-in replacement for SimpleRNN().

[ 35 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!