16.03.2021 Views

Advanced Deep Learning with Keras

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Improved GANs

In summary, the goal of this chapter is to introduce these improved GANs and

to present:

• The theoretical formulation of the WGAN

• An understanding of the principles of LSGAN

• An understanding of the principles of ACGAN

• Knowledge of how to implement improved GANs - WGAN, LSGAN, and

ACGAN using Keras

Wasserstein GAN

As we've mentioned before, GANs are notoriously hard to train. The opposing

objectives of the two networks, the discriminator and the generator, can easily

cause training instability. The discriminator attempts to correctly classify the

fake data from the real data. Meanwhile, the generator tries its best to trick the

discriminator. If the discriminator learns faster than the generator, the generator

parameters will fail to optimize. On the other hand, if the discriminator learns more

slowly, then the gradients may vanish before reaching the generator. In the worst

case, if the discriminator is unable to converge, the generator is not going to be able

to get any useful feedback.

Distance functions

The stability in training a GAN can be understood by examining its loss

functions. To better understand the GAN loss functions, we're going to review

the common distance or divergence functions between two probability distributions.

Our concern is the distance between p data

for true data distribution and p g

for

generator data distribution. The goal of GANs is to make p g

→ p data

. Table 5.1.1

shows the divergence functions.

In most maximum likelihood tasks, we'll use Kullback-Leibler (KL) divergence or

D KL

in the loss function as a measure of how far our neural network model prediction

is from the true distribution function. As shown in Equation 5.1.1, D KL

is not

D p || p ≠ D p || p .

symmetric since KL ( data g ) KL ( g data)

Jensen-Shannon (JS) or D JS

is a divergence that is based on D KL

. However, unlike

D KL

, D JS

is symmetrical and will be finite. In this section, we'll show that optimizing

the GAN loss functions is equivalent to optimizing D JS

.

[ 126 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!