pdfcoffee
Chapter 1Figure 29: Test accuracy for increasing the values of internal hidden neuronsIncreasing the size of batch computationGradient descent tries to minimize the cost function on all the examples provided inthe training sets and, at the same time, for all the features provided in input. SGD is amuch less expensive variant that considers only BATCH_SIZE examples. So, let us seehow it behaves when we change this parameter. As you can see, the best accuracyvalue is reached for a BATCH_SIZE=64 in our four experiments (see Figure 30):Figure 30: Test accuracy for different batch values[ 35 ]
Neural Network Foundations with TensorFlow 2.0Summarizing experiments run for recognizinghandwritten chartsSo, let's summarize: with five different variants, we were able to improve ourperformance from 90.71% to 97.82%. First, we defined a simple layer networkin TensorFlow 2.0. Then, we improved the performance by adding some hiddenlayers. After that, we improved the performance on the test set by adding a fewrandom dropouts in our network, and then by experimenting with different typesof optimizers:model/accuracy training validation testsimple 89.96% 90.70% 90.71%2 hidden(128) 90.81% 91.40% 91.18%dropout(30%) 91.70% 94.42% 94.15% (200 epochs)RMSProp 97.43% 97.62% 97.64% (10 epochs)Adam 98.94% 97.89% 97.82% (10 epochs)However, the next two experiments (not shown in the preceding table) were notproviding significant improvements. Increasing the number of internal neuronscreates more complex models and requires more expensive computations, butit provides only marginal gains. We have the same experience if we increasethe number of training epochs. A final experiment consisted of changing the BATCH_SIZE for our optimizer. This also provided marginal results.RegularizationIn this section, we will review a few best practices for improving the training phase.In particular, regularization and batch normalization will be discussed.Adopting regularization to avoid overfittingIntuitively, a good machine learning model should achieve a low error rate ontraining data. Mathematically this is equivalent to minimizing the loss functionon the training data given the model:min: {loss(Training Data | Model)}[ 36 ]
- Page 20 and 21: PrefaceDeep Learning with TensorFlo
- Page 22 and 23: • Supervised learning, in which t
- Page 24 and 25: PrefaceThe complexity of deep learn
- Page 26 and 27: PrefaceFigure 5: Adoption of deep l
- Page 28 and 29: Chapter 1, Neural Network Foundatio
- Page 30 and 31: PrefaceChapter 13, TensorFlow for M
- Page 32 and 33: ConventionsThere are a number of te
- Page 34: PrefaceReferences1. Deep Learning w
- Page 37 and 38: Neural Network Foundations with Ten
- Page 39 and 40: Neural Network Foundations with Ten
- Page 41 and 42: Neural Network Foundations with Ten
- Page 43 and 44: Neural Network Foundations with Ten
- Page 45 and 46: Neural Network Foundations with Ten
- Page 47 and 48: Neural Network Foundations with Ten
- Page 49 and 50: Neural Network Foundations with Ten
- Page 51 and 52: Neural Network Foundations with Ten
- Page 53 and 54: Neural Network Foundations with Ten
- Page 55 and 56: Neural Network Foundations with Ten
- Page 57 and 58: Neural Network Foundations with Ten
- Page 59 and 60: Neural Network Foundations with Ten
- Page 61 and 62: Neural Network Foundations with Ten
- Page 63 and 64: Neural Network Foundations with Ten
- Page 65 and 66: Neural Network Foundations with Ten
- Page 67 and 68: Neural Network Foundations with Ten
- Page 69: Neural Network Foundations with Ten
- Page 73 and 74: Neural Network Foundations with Ten
- Page 75 and 76: Neural Network Foundations with Ten
- Page 77 and 78: Neural Network Foundations with Ten
- Page 79 and 80: Neural Network Foundations with Ten
- Page 81 and 82: Neural Network Foundations with Ten
- Page 83 and 84: Neural Network Foundations with Ten
- Page 86 and 87: TensorFlow 1.x and 2.xThe intent of
- Page 88 and 89: An example to start withWe'll consi
- Page 90 and 91: Chapter 23. Placeholders: Placehold
- Page 92 and 93: • To create random values from a
- Page 94 and 95: To know the value, we need to creat
- Page 96 and 97: Chapter 2Both PyTorch and TensorFlo
- Page 98 and 99: Chapter 2state = [tf.zeros([100, 10
- Page 100 and 101: Chapter 2For now, there's no need t
- Page 102 and 103: Chapter 2Let's see an example of a
- Page 104 and 105: Chapter 2If you want to save a mode
- Page 106 and 107: Chapter 2supervised=True)train_data
- Page 108 and 109: Chapter 2There, tf.feature_column.n
- Page 110 and 111: Chapter 2print (dz_dx)print (dy_dx)
- Page 112 and 113: Chapter 2In our toy example we use
- Page 114 and 115: Chapter 2For multi-machine training
- Page 116 and 117: Chapter 25. Use tf.layers modules t
- Page 118 and 119: Chapter 2Keras or tf.keras?Another
Neural Network Foundations with TensorFlow 2.0
Summarizing experiments run for recognizing
handwritten charts
So, let's summarize: with five different variants, we were able to improve our
performance from 90.71% to 97.82%. First, we defined a simple layer network
in TensorFlow 2.0. Then, we improved the performance by adding some hidden
layers. After that, we improved the performance on the test set by adding a few
random dropouts in our network, and then by experimenting with different types
of optimizers:
model/accuracy training validation test
simple 89.96% 90.70% 90.71%
2 hidden(128) 90.81% 91.40% 91.18%
dropout(30%) 91.70% 94.42% 94.15% (200 epochs)
RMSProp 97.43% 97.62% 97.64% (10 epochs)
Adam 98.94% 97.89% 97.82% (10 epochs)
However, the next two experiments (not shown in the preceding table) were not
providing significant improvements. Increasing the number of internal neurons
creates more complex models and requires more expensive computations, but
it provides only marginal gains. We have the same experience if we increase
the number of training epochs. A final experiment consisted of changing the BATCH_
SIZE for our optimizer. This also provided marginal results.
Regularization
In this section, we will review a few best practices for improving the training phase.
In particular, regularization and batch normalization will be discussed.
Adopting regularization to avoid overfitting
Intuitively, a good machine learning model should achieve a low error rate on
training data. Mathematically this is equivalent to minimizing the loss function
on the training data given the model:
min: {loss(Training Data | Model)}
[ 36 ]