16.03.2021 Views

Advanced Deep Learning with Keras

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Disentangled Representation GANs

In the first section of this chapter, we will be discussing InfoGAN: Interpretable

Representation Learning by Information Maximizing Generative Adversarial Nets [1],

an extension to GANs. InfoGAN learns the disentangled representations in an

unsupervised manner by maximizing the mutual information between the input

codes and the output observation. On the MNIST dataset, InfoGAN disentangles

the writing styles from digits dataset.

In the following part of the chapter, we'll also be discussing the Stacked Generative

Adversarial Networks or StackedGAN [2], another extension to GANs.

StackedGAN uses a pretrained encoder or classifier in order to aid in disentangling

the latent codes. StackedGAN can be viewed as a stack of models, with each being

made of an encoder and a GAN. Each GAN is trained in an adversarial manner by

using the input and output data of the corresponding encoder.

In summary, the goal of this chapter is to present:

• The concepts of disentangled representations

• The principles of both InfoGAN and StackedGAN

• Implementation of both InfoGAN and StackedGAN using Keras

Disentangled representations

The original GAN was able to generate meaningful outputs, but the downside

was that it couldn't be controlled. For example, if we trained a GAN to learn the

distribution of celebrity faces, the generator would produce new images of celebritylooking

people. However, there is no way to influence the generator on the specific

attributes of the face that we want. For example, we're unable to ask the generator for

a face of a female celebrity with long black hair, a fair complexion, brown eyes, and

whose smiling. The fundamental reason for this is because the 100-dim noise code

that we use entangles all of the salient attributes of the generator outputs. We can

recall that in Keras, the 100-dim code was generated by random sampling of uniform

noise distribution:

# generate 64 fake images from 64 x 100-dim uniform noise

noise = np.random.uniform(-1.0, 1.0, size=[64, 100])

fake_images = generator.predict(noise)

If we are able to modify the original GAN, such that we were able to separate the

code or representation into entangled and disentangled interpretable latent codes,

we would be able to tell the generator what to synthesize.

[ 162 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!