Advanced Deep Learning with Keras

fourpersent2020
from fourpersent2020 More from this publisher
16.03.2021 Views

Chapter 1The highest test accuracy is with the Adam optimizer and Dropout(0.45) at 98.5%.Technically, there is still some degree of overfitting given that its training accuracyis 99.39%. Both the train and test accuracy are the same at 98.2% for 256-512-256,Dropout(0.45) and SGD. Removing both the Regularizer and ReLU layers resultsin it having the worst performance. Generally, we'll find that the Dropout layerhas better performance than l2.Following table demonstrates a typical deep neural network performanceduring tuning. The example indicates that there is a need to improve the networkarchitecture. In the following section, another model using CNNs shows a significantimprovement in test accuracy:Layers Regularizer Optimizer ReLU TrainAccuracy, %256-256-256 None SGD None 93.65 92.5256-256-256 L2(0.001) SGD Yes 99.35 98.0256-256-256 L2(0.01) SGD Yes 96.90 96.7256-256-256 None SGD Yes 99.93 98.0256-256-256 Dropout(0.4) SGD Yes 98.23 98.1256-256-256 Dropout(0.45) SGD Yes 98.07 98.1256-256-256 Dropout(0.5) SGD Yes 97.68 98.1256-256-256 Dropout(0.6) SGD Yes 97.11 97.9256-512-256 Dropout(0.45) SGD Yes 98.21 98.2512-512-512 Dropout(0.2) SGD Yes 99.45 98.3512-512-512 Dropout(0.4) SGD Yes 98.95 98.3512-1024-512 Dropout(0.45) SGD Yes 98.90 98.21024-1024-1024 Dropout(0.4) SGD Yes 99.37 98.3256-256-256 Dropout(0.6) Adam Yes 98.64 98.2256-256-256 Dropout(0.55) Adam Yes 99.02 98.3256-256-256 Dropout(0.45) Adam Yes 99.39 98.5256-256-256 Dropout(0.45) RMSprop Yes 98.75 98.1128-128-128 Dropout(0.45) Adam Yes 98.70 97.7TestAccuracy, %Table 1.3.2: Different MLP network configurations and performance measuresModel summaryUsing the Keras library provides us with a quick mechanism to double check themodel description by calling:model.summary()[ 21 ]

Introducing Advanced Deep Learning with KerasListing 1.3.2 shows the model summary of the proposed network. It requiresa total of 269,322 parameters. This is substantial considering that we have a simpletask of classifying MNIST digits. MLPs are not parameter efficient. The number ofparameters can be computed from Figure 1.3.4 by focusing on how the output of theperceptron is computed. From input to Dense layer: 784 × 256 + 256 = 200,960. Fromfirst Dense to second Dense: 256 × 256 + 256 = 65,792. From second Dense to theoutput layer: 10 × 256 + 10 = 2,570. The total is 269,322.Listing 1.3.2 shows a summary of an MLP MNIST digit classifier model:_________________________________________________________________Layer (type) Output Shape Param #=================================================================dense_1 (Dense) (None, 256) 200960_________________________________________________________________activation_1 (Activation) (None, 256) 0_________________________________________________________________dropout_1 (Dropout) (None, 256) 0_________________________________________________________________dense_2 (Dense) (None, 256) 65792_________________________________________________________________activation_2 (Activation) (None, 256) 0_________________________________________________________________dropout_2 (Dropout) (None, 256) 0_________________________________________________________________dense_3 (Dense) (None, 10) 2570_________________________________________________________________activation_3 (Activation) (None, 10) 0=================================================================Total params: 269,322Trainable params: 269,322Non-trainable params: 0Another way of verifying the network is by calling:plot_model(model, to_file='mlp-mnist.png', show_shapes=True)Figure 1.3.9 shows the plot. You'll find that this is similar to the results of summary()but graphically shows the interconnection and I/O of each layer.[ 22 ]

Chapter 1

The highest test accuracy is with the Adam optimizer and Dropout(0.45) at 98.5%.

Technically, there is still some degree of overfitting given that its training accuracy

is 99.39%. Both the train and test accuracy are the same at 98.2% for 256-512-256,

Dropout(0.45) and SGD. Removing both the Regularizer and ReLU layers results

in it having the worst performance. Generally, we'll find that the Dropout layer

has better performance than l2.

Following table demonstrates a typical deep neural network performance

during tuning. The example indicates that there is a need to improve the network

architecture. In the following section, another model using CNNs shows a significant

improvement in test accuracy:

Layers Regularizer Optimizer ReLU Train

Accuracy, %

256-256-256 None SGD None 93.65 92.5

256-256-256 L2(0.001) SGD Yes 99.35 98.0

256-256-256 L2(0.01) SGD Yes 96.90 96.7

256-256-256 None SGD Yes 99.93 98.0

256-256-256 Dropout(0.4) SGD Yes 98.23 98.1

256-256-256 Dropout(0.45) SGD Yes 98.07 98.1

256-256-256 Dropout(0.5) SGD Yes 97.68 98.1

256-256-256 Dropout(0.6) SGD Yes 97.11 97.9

256-512-256 Dropout(0.45) SGD Yes 98.21 98.2

512-512-512 Dropout(0.2) SGD Yes 99.45 98.3

512-512-512 Dropout(0.4) SGD Yes 98.95 98.3

512-1024-512 Dropout(0.45) SGD Yes 98.90 98.2

1024-1024-1024 Dropout(0.4) SGD Yes 99.37 98.3

256-256-256 Dropout(0.6) Adam Yes 98.64 98.2

256-256-256 Dropout(0.55) Adam Yes 99.02 98.3

256-256-256 Dropout(0.45) Adam Yes 99.39 98.5

256-256-256 Dropout(0.45) RMSprop Yes 98.75 98.1

128-128-128 Dropout(0.45) Adam Yes 98.70 97.7

Test

Accuracy, %

Table 1.3.2: Different MLP network configurations and performance measures

Model summary

Using the Keras library provides us with a quick mechanism to double check the

model description by calling:

model.summary()

[ 21 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!