Advanced Deep Learning with Keras
Chapter 1The highest test accuracy is with the Adam optimizer and Dropout(0.45) at 98.5%.Technically, there is still some degree of overfitting given that its training accuracyis 99.39%. Both the train and test accuracy are the same at 98.2% for 256-512-256,Dropout(0.45) and SGD. Removing both the Regularizer and ReLU layers resultsin it having the worst performance. Generally, we'll find that the Dropout layerhas better performance than l2.Following table demonstrates a typical deep neural network performanceduring tuning. The example indicates that there is a need to improve the networkarchitecture. In the following section, another model using CNNs shows a significantimprovement in test accuracy:Layers Regularizer Optimizer ReLU TrainAccuracy, %256-256-256 None SGD None 93.65 92.5256-256-256 L2(0.001) SGD Yes 99.35 98.0256-256-256 L2(0.01) SGD Yes 96.90 96.7256-256-256 None SGD Yes 99.93 98.0256-256-256 Dropout(0.4) SGD Yes 98.23 98.1256-256-256 Dropout(0.45) SGD Yes 98.07 98.1256-256-256 Dropout(0.5) SGD Yes 97.68 98.1256-256-256 Dropout(0.6) SGD Yes 97.11 97.9256-512-256 Dropout(0.45) SGD Yes 98.21 98.2512-512-512 Dropout(0.2) SGD Yes 99.45 98.3512-512-512 Dropout(0.4) SGD Yes 98.95 98.3512-1024-512 Dropout(0.45) SGD Yes 98.90 98.21024-1024-1024 Dropout(0.4) SGD Yes 99.37 98.3256-256-256 Dropout(0.6) Adam Yes 98.64 98.2256-256-256 Dropout(0.55) Adam Yes 99.02 98.3256-256-256 Dropout(0.45) Adam Yes 99.39 98.5256-256-256 Dropout(0.45) RMSprop Yes 98.75 98.1128-128-128 Dropout(0.45) Adam Yes 98.70 97.7TestAccuracy, %Table 1.3.2: Different MLP network configurations and performance measuresModel summaryUsing the Keras library provides us with a quick mechanism to double check themodel description by calling:model.summary()[ 21 ]
Introducing Advanced Deep Learning with KerasListing 1.3.2 shows the model summary of the proposed network. It requiresa total of 269,322 parameters. This is substantial considering that we have a simpletask of classifying MNIST digits. MLPs are not parameter efficient. The number ofparameters can be computed from Figure 1.3.4 by focusing on how the output of theperceptron is computed. From input to Dense layer: 784 × 256 + 256 = 200,960. Fromfirst Dense to second Dense: 256 × 256 + 256 = 65,792. From second Dense to theoutput layer: 10 × 256 + 10 = 2,570. The total is 269,322.Listing 1.3.2 shows a summary of an MLP MNIST digit classifier model:_________________________________________________________________Layer (type) Output Shape Param #=================================================================dense_1 (Dense) (None, 256) 200960_________________________________________________________________activation_1 (Activation) (None, 256) 0_________________________________________________________________dropout_1 (Dropout) (None, 256) 0_________________________________________________________________dense_2 (Dense) (None, 256) 65792_________________________________________________________________activation_2 (Activation) (None, 256) 0_________________________________________________________________dropout_2 (Dropout) (None, 256) 0_________________________________________________________________dense_3 (Dense) (None, 10) 2570_________________________________________________________________activation_3 (Activation) (None, 10) 0=================================================================Total params: 269,322Trainable params: 269,322Non-trainable params: 0Another way of verifying the network is by calling:plot_model(model, to_file='mlp-mnist.png', show_shapes=True)Figure 1.3.9 shows the plot. You'll find that this is similar to the results of summary()but graphically shows the interconnection and I/O of each layer.[ 22 ]
- Page 2 and 3: Advanced Deep Learningwith KerasApp
- Page 4 and 5: mapt.ioMapt is an online digital li
- Page 6 and 7: I would like to thank my family, Ch
- Page 8 and 9: Table of ContentsPrefaceVChapter 1:
- Page 10 and 11: [ iii ]Table of ContentsChapter 7:
- Page 12 and 13: [ v ]PrefaceIn recent years, deep l
- Page 14 and 15: Chapter 5, Improved GANs, covers al
- Page 16 and 17: def encoder_layer(inputs,filters=16
- Page 18 and 19: Introducing Advanced DeepLearning w
- Page 20 and 21: Chapter 1Installing Keras and Tenso
- Page 22 and 23: Chapter 1• RNNs: Recurrent neural
- Page 24 and 25: [ 7 ]Chapter 1In the preceding figu
- Page 26 and 27: Chapter 1Figure 1.3.3: MLP MNIST di
- Page 28 and 29: Chapter 1model.add(Activation('soft
- Page 30 and 31: Chapter 1model.add(Activation('relu
- Page 32 and 33: Chapter 1As an example, l2 weight r
- Page 34 and 35: [ 17 ]Chapter 1How far the predicte
- Page 36 and 37: Chapter 1Figure 1.3.8: Plot of a fu
- Page 40 and 41: Chapter 1Figure 1.3.9: The graphica
- Page 42 and 43: Chapter 1# image is processed as is
- Page 44 and 45: Chapter 1The computation involved i
- Page 46 and 47: Chapter 1Listing 1.4.2 shows a summ
- Page 48 and 49: Chapter 164-64-64 RMSprop Dropout(0
- Page 50 and 51: Chapter 1There are the two main dif
- Page 52 and 53: Chapter 1Layers Optimizer Regulariz
- Page 54: ConclusionThis chapter provided an
- Page 57 and 58: Deep Neural NetworksWhile this chap
- Page 59 and 60: Deep Neural Networks# reshape and n
- Page 61 and 62: Deep Neural NetworksEverything else
- Page 63 and 64: Deep Neural Networksfrom keras.util
- Page 65 and 66: Deep Neural NetworksFigure 2.1.3: T
- Page 67 and 68: Deep Neural NetworksHence, the netw
- Page 69 and 70: Deep Neural NetworksGenerally speak
- Page 71 and 72: Deep Neural NetworksIn the dataset,
- Page 73 and 74: Deep Neural NetworksTransition Laye
- Page 75 and 76: Deep Neural NetworksThere are some
- Page 77 and 78: Deep Neural NetworksResNet v2 is al
- Page 79 and 80: Deep Neural Networks…if version =
- Page 81 and 82: Deep Neural NetworksTo prevent the
- Page 83 and 84: Deep Neural NetworksAverage Pooling
- Page 85 and 86: Deep Neural Networks# orig paper us
Chapter 1
The highest test accuracy is with the Adam optimizer and Dropout(0.45) at 98.5%.
Technically, there is still some degree of overfitting given that its training accuracy
is 99.39%. Both the train and test accuracy are the same at 98.2% for 256-512-256,
Dropout(0.45) and SGD. Removing both the Regularizer and ReLU layers results
in it having the worst performance. Generally, we'll find that the Dropout layer
has better performance than l2.
Following table demonstrates a typical deep neural network performance
during tuning. The example indicates that there is a need to improve the network
architecture. In the following section, another model using CNNs shows a significant
improvement in test accuracy:
Layers Regularizer Optimizer ReLU Train
Accuracy, %
256-256-256 None SGD None 93.65 92.5
256-256-256 L2(0.001) SGD Yes 99.35 98.0
256-256-256 L2(0.01) SGD Yes 96.90 96.7
256-256-256 None SGD Yes 99.93 98.0
256-256-256 Dropout(0.4) SGD Yes 98.23 98.1
256-256-256 Dropout(0.45) SGD Yes 98.07 98.1
256-256-256 Dropout(0.5) SGD Yes 97.68 98.1
256-256-256 Dropout(0.6) SGD Yes 97.11 97.9
256-512-256 Dropout(0.45) SGD Yes 98.21 98.2
512-512-512 Dropout(0.2) SGD Yes 99.45 98.3
512-512-512 Dropout(0.4) SGD Yes 98.95 98.3
512-1024-512 Dropout(0.45) SGD Yes 98.90 98.2
1024-1024-1024 Dropout(0.4) SGD Yes 99.37 98.3
256-256-256 Dropout(0.6) Adam Yes 98.64 98.2
256-256-256 Dropout(0.55) Adam Yes 99.02 98.3
256-256-256 Dropout(0.45) Adam Yes 99.39 98.5
256-256-256 Dropout(0.45) RMSprop Yes 98.75 98.1
128-128-128 Dropout(0.45) Adam Yes 98.70 97.7
Test
Accuracy, %
Table 1.3.2: Different MLP network configurations and performance measures
Model summary
Using the Keras library provides us with a quick mechanism to double check the
model description by calling:
model.summary()
[ 21 ]