16.03.2021 Views

Advanced Deep Learning with Keras

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 4

The loss function simply maximizes the chance of the discriminator into believing

that the synthetic data is real by training the generator. The new formulation is no

longer zero-sum and is purely heuristics-driven. Figure 4.1.4 shows the generator

during training. In this figure, the generator parameters are only updated when

the whole adversarial network is trained. This is because the gradients are

passed down from the discriminator to the generator. However, in practice, the

discriminator weights are only temporarily frozen during adversarial training.

In deep learning, both the generator and discriminator can be implemented using

a suitable neural network architecture. If the data or signal is an image, both the

generator and discriminator networks will use a CNN. For single-dimensional

sequences like in NLP, both networks are usually recurrent (RNN, LSTM or GRU).

GAN implementation in Keras

In the previous section, we learned that the principles behind GANs are

straightforward. We also learned how GANs could be implemented by familiar

network layers such as CNNs and RNNs. What differentiates GANs from other

networks is they are notoriously difficult to train. Something as simple as a minor

change in the layers can drive the network to training instability.

In this section, we'll examine one of the early successful implementations

of GANs using deep CNNs. It is called DCGAN [3].

Figure 4.2.1 shows DCGAN that is used to generate fake MNIST images.

DCGAN recommends the following design principles:

• Use of strides > 1 convolution instead of MaxPooling2D or UpSampling2D.

With strides > 1, the CNN learns how to resize the feature maps.

• Avoid using Dense layers. Use CNN in all layers. The Dense layer is utilized

only as the first layer of the generator to accept the z-vector. The output of the

Dense layer is resized and becomes the input of the succeeding CNN layers.

• Use of Batch Normalization (BN) to stabilize learning by normalizing

the input to each layer to have zero mean and unit variance. No BN

in the generator output layer and discriminator input layer. In the

implementation example to be presented here, no batch normalization

is used in the discriminator.

• Rectified Linear Unit (ReLU) is used in all layers of the generator except in

the output layer where the tanh activation is utilized. In the implementation

example to be presented here, sigmoid is used instead of tanh in the output

of the generator since it generally results in a more stable training for

MNIST digits.

[ 105 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!