Advanced Deep Learning with Keras

fourpersent2020
from fourpersent2020 More from this publisher
16.03.2021 Views

Chapter 6Following figure shows us a GAN with an entangled code and its variation witha mixture of entangled and disentangled representations. In the context of thehypothetical celebrity face generation, with the disentangled codes, we are able toindicate the gender, hairstyle, facial expression, skin complexion and eye color ofthe face we wish to generate. The n–dim entangled code is still needed to representall the other facial attributes that we have not disentangled like the face shape,facial hair, eye-glasses, as just three examples. The concatenation of entangled anddisentangled codes serves as the new input to the generator. The total dimension ofthe concatenated code may not be necessarily 100:Figure 6.1.1: The GAN with the entangled code and its variation with both entangledand disentangled codes. This example is shown in the context of celebrity face generation.Looking at preceding figure, it appears that GANs with disentangled representationscan also be optimized in the same way as a vanilla GAN can be. This is becausethe generator output can be represented as:The code = ( z,c)( z,c ) = ( z)G G (Equation 6.1.1)z is made of two elements:1. Incompressible entangled noise code similar to GANs z or noise vector.2. Latent codes, c 1,c 2,…,c L, which represent the interpretable disentangled codesof the data distribution. Collectively all latent codes are represented by c.For simplicity, all the latent codes are assumed to be independent:L( , , , ) ( )p c c c = ∏ =p ci… (Equation 6.1.2)1 2 L i 1G G is provided with both the incompressiblenoise code and the latent codes. From the point of view of the generator, optimizingz = ( z,c)is the same as optimizing z. The generator network will simply ignorethe constraint imposed by the disentangled codes when coming up with a solution.The generator learns the distribution pg( x | c) = pg( x). This will practically defeatthe objective of disentangled representations.The generator function x = ( z,c) = ( z)[ 163 ]

Disentangled Representation GANsInfoGANTo enforce the disentanglement of codes, InfoGAN proposed a regularizer to theoriginal loss function that maximizes the mutual information between the latentcodes c and G ( z,c):( ; ( , )) = ( ; ( z))I c G z c I c G (Equation 6.1.3)The regularizer forces the generator to consider the latent codes when it formulatesa function that synthesizes the fake images. In the field of information theory,the mutual information between latent codes c and G ( z,c)is defined as:( ; ( , )) = ( ) − ( | ( , ))I c G z c H c H c G z c (Equation 6.1.4)Where H(c) is the entropy of the latent code c and H ( c | ( z,c))entropy of c, after observing the output of the generator, G ( z,c)G is the conditional. Entropyis a measure of uncertainty of a random variable or an event. For example,information like, the sun rises in the east, has low entropy. Whereas, winningthe jackpot in the lottery has high entropy.In Equation 6.1.4, maximizing the mutual information means minimizingH ( c | G ( z,c)) or decreasing the uncertainty in the latent code upon observing thegenerated output. This makes sense since, for example, in the MNIST dataset, thegenerator becomes more confident in synthesizing the digit 8 if the GAN sees thatit observed the digit 8.However, it is hard to estimate H ( c | ( z,c))posterior P ( c | G ( z, c)) = P ( c | x)G since it requires knowledge of the, which is something that we don't have access to. Theworkaround is to estimate the lower bound of mutual information by estimating theposterior with an auxiliary distribution Q(c|x). InfoGAN estimates the lower boundof mutual information as:( ; ( , )) ≥ ( , ) =( ) ( )⎡log ( | ) ⎤ + ( )I c G z c LI G Q E Q c x H c (Equation 6.1.5)c∼P c , x∼Gz,c ⎣ ⎦In InfoGAN, H(c) is assumed to be a constant. Therefore, maximizing the mutualinformation is a matter of maximizing the expectation. The generator must beconfident that it has generated an output with the specific attributes. We shouldnote that the maximum value of this expectation is zero. Therefore, the maximumof the lower bound of the mutual information is H(c). In InfoGAN, Q(c|x) fordiscrete latent codes can be represented by softmax nonlinearity. The expectationis the negative categorical_crossentropy loss in Keras.[ 164 ]

Chapter 6

Following figure shows us a GAN with an entangled code and its variation with

a mixture of entangled and disentangled representations. In the context of the

hypothetical celebrity face generation, with the disentangled codes, we are able to

indicate the gender, hairstyle, facial expression, skin complexion and eye color of

the face we wish to generate. The n–dim entangled code is still needed to represent

all the other facial attributes that we have not disentangled like the face shape,

facial hair, eye-glasses, as just three examples. The concatenation of entangled and

disentangled codes serves as the new input to the generator. The total dimension of

the concatenated code may not be necessarily 100:

Figure 6.1.1: The GAN with the entangled code and its variation with both entangled

and disentangled codes. This example is shown in the context of celebrity face generation.

Looking at preceding figure, it appears that GANs with disentangled representations

can also be optimized in the same way as a vanilla GAN can be. This is because

the generator output can be represented as:

The code = ( z,

c)

( z,

c ) = ( z)

G G (Equation 6.1.1)

z is made of two elements:

1. Incompressible entangled noise code similar to GANs z or noise vector.

2. Latent codes, c 1

,c 2

,…,c L

, which represent the interpretable disentangled codes

of the data distribution. Collectively all latent codes are represented by c.

For simplicity, all the latent codes are assumed to be independent:

L

( , , , ) ( )

p c c c = ∏ =

p ci

… (Equation 6.1.2)

1 2 L i 1

G G is provided with both the incompressible

noise code and the latent codes. From the point of view of the generator, optimizing

z = ( z,

c)

is the same as optimizing z. The generator network will simply ignore

the constraint imposed by the disentangled codes when coming up with a solution.

The generator learns the distribution pg

( x | c) = pg

( x)

. This will practically defeat

the objective of disentangled representations.

The generator function x = ( z,

c) = ( z)

[ 163 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!