Advanced Deep Learning with Keras
Chapter 6Following figure shows us a GAN with an entangled code and its variation witha mixture of entangled and disentangled representations. In the context of thehypothetical celebrity face generation, with the disentangled codes, we are able toindicate the gender, hairstyle, facial expression, skin complexion and eye color ofthe face we wish to generate. The n–dim entangled code is still needed to representall the other facial attributes that we have not disentangled like the face shape,facial hair, eye-glasses, as just three examples. The concatenation of entangled anddisentangled codes serves as the new input to the generator. The total dimension ofthe concatenated code may not be necessarily 100:Figure 6.1.1: The GAN with the entangled code and its variation with both entangledand disentangled codes. This example is shown in the context of celebrity face generation.Looking at preceding figure, it appears that GANs with disentangled representationscan also be optimized in the same way as a vanilla GAN can be. This is becausethe generator output can be represented as:The code = ( z,c)( z,c ) = ( z)G G (Equation 6.1.1)z is made of two elements:1. Incompressible entangled noise code similar to GANs z or noise vector.2. Latent codes, c 1,c 2,…,c L, which represent the interpretable disentangled codesof the data distribution. Collectively all latent codes are represented by c.For simplicity, all the latent codes are assumed to be independent:L( , , , ) ( )p c c c = ∏ =p ci… (Equation 6.1.2)1 2 L i 1G G is provided with both the incompressiblenoise code and the latent codes. From the point of view of the generator, optimizingz = ( z,c)is the same as optimizing z. The generator network will simply ignorethe constraint imposed by the disentangled codes when coming up with a solution.The generator learns the distribution pg( x | c) = pg( x). This will practically defeatthe objective of disentangled representations.The generator function x = ( z,c) = ( z)[ 163 ]
Disentangled Representation GANsInfoGANTo enforce the disentanglement of codes, InfoGAN proposed a regularizer to theoriginal loss function that maximizes the mutual information between the latentcodes c and G ( z,c):( ; ( , )) = ( ; ( z))I c G z c I c G (Equation 6.1.3)The regularizer forces the generator to consider the latent codes when it formulatesa function that synthesizes the fake images. In the field of information theory,the mutual information between latent codes c and G ( z,c)is defined as:( ; ( , )) = ( ) − ( | ( , ))I c G z c H c H c G z c (Equation 6.1.4)Where H(c) is the entropy of the latent code c and H ( c | ( z,c))entropy of c, after observing the output of the generator, G ( z,c)G is the conditional. Entropyis a measure of uncertainty of a random variable or an event. For example,information like, the sun rises in the east, has low entropy. Whereas, winningthe jackpot in the lottery has high entropy.In Equation 6.1.4, maximizing the mutual information means minimizingH ( c | G ( z,c)) or decreasing the uncertainty in the latent code upon observing thegenerated output. This makes sense since, for example, in the MNIST dataset, thegenerator becomes more confident in synthesizing the digit 8 if the GAN sees thatit observed the digit 8.However, it is hard to estimate H ( c | ( z,c))posterior P ( c | G ( z, c)) = P ( c | x)G since it requires knowledge of the, which is something that we don't have access to. Theworkaround is to estimate the lower bound of mutual information by estimating theposterior with an auxiliary distribution Q(c|x). InfoGAN estimates the lower boundof mutual information as:( ; ( , )) ≥ ( , ) =( ) ( )⎡log ( | ) ⎤ + ( )I c G z c LI G Q E Q c x H c (Equation 6.1.5)c∼P c , x∼Gz,c ⎣ ⎦In InfoGAN, H(c) is assumed to be a constant. Therefore, maximizing the mutualinformation is a matter of maximizing the expectation. The generator must beconfident that it has generated an output with the specific attributes. We shouldnote that the maximum value of this expectation is zero. Therefore, the maximumof the lower bound of the mutual information is H(c). In InfoGAN, Q(c|x) fordiscrete latent codes can be represented by softmax nonlinearity. The expectationis the negative categorical_crossentropy loss in Keras.[ 164 ]
- Page 129 and 130: Generative Adversarial Networks (GA
- Page 131 and 132: Generative Adversarial Networks (GA
- Page 133 and 134: Generative Adversarial Networks (GA
- Page 135 and 136: Generative Adversarial Networks (GA
- Page 137 and 138: Generative Adversarial Networks (GA
- Page 139 and 140: Generative Adversarial Networks (GA
- Page 141 and 142: Generative Adversarial Networks (GA
- Page 143 and 144: Improved GANsIn summary, the goal o
- Page 145 and 146: Improved GANsThe intuition behind E
- Page 147 and 148: Improved GANsThis makes sense since
- Page 149 and 150: Improved GANsIn the context of GANs
- Page 151 and 152: Improved GANsFigure 5.1.3: Top: Tra
- Page 153 and 154: Improved GANsThe functions include:
- Page 155 and 156: Improved GANsmodels = (generator, d
- Page 157 and 158: Improved GANsfor layer in discrimin
- Page 159 and 160: Improved GANsFollowing figure shows
- Page 161 and 162: Improved GANsThe preceding table sh
- Page 163 and 164: Improved GANsFollowing figure shows
- Page 165 and 166: Improved GANsEssentially, in CGAN w
- Page 167 and 168: Improved GANslayer = Dense(layer_fi
- Page 169 and 170: Improved GANsx = BatchNormalization
- Page 171 and 172: Improved GANsdiscriminator.compile(
- Page 173 and 174: Improved GANssize=batch_size)real_i
- Page 175 and 176: Improved GANsUnlike CGAN, the sampl
- Page 177 and 178: Improved GANsConclusionIn this chap
- Page 179: Disentangled Representation GANsIn
- Page 183 and 184: Disentangled Representation GANsFol
- Page 185 and 186: Disentangled Representation GANs# A
- Page 187 and 188: Disentangled Representation GANsif
- Page 189 and 190: Disentangled Representation GANsLis
- Page 191 and 192: Disentangled Representation GANsdat
- Page 193 and 194: Disentangled Representation GANsy[b
- Page 195 and 196: Disentangled Representation GANspyt
- Page 197 and 198: Disentangled Representation GANsThe
- Page 199 and 200: Disentangled Representation GANsSta
- Page 201 and 202: Disentangled Representation GANs( )
- Page 203 and 204: Disentangled Representation GANsThe
- Page 205 and 206: Disentangled Representation GANsfea
- Page 207 and 208: Disentangled Representation GANs# f
- Page 209 and 210: Disentangled Representation GANslat
- Page 211 and 212: Disentangled Representation GANsDis
- Page 213 and 214: Disentangled Representation GANsz_d
- Page 215 and 216: Disentangled Representation GANs2.
- Page 217 and 218: Disentangled Representation GANsFig
- Page 220 and 221: Cross-Domain GANsIn computer vision
- Page 222 and 223: Chapter 7There are many more exampl
- Page 224 and 225: The CycleGAN ModelFigure 7.1.3 show
- Page 226 and 227: Chapter 7Repeat for n training step
- Page 228 and 229: Chapter 7Implementing CycleGAN usin
Chapter 6
Following figure shows us a GAN with an entangled code and its variation with
a mixture of entangled and disentangled representations. In the context of the
hypothetical celebrity face generation, with the disentangled codes, we are able to
indicate the gender, hairstyle, facial expression, skin complexion and eye color of
the face we wish to generate. The n–dim entangled code is still needed to represent
all the other facial attributes that we have not disentangled like the face shape,
facial hair, eye-glasses, as just three examples. The concatenation of entangled and
disentangled codes serves as the new input to the generator. The total dimension of
the concatenated code may not be necessarily 100:
Figure 6.1.1: The GAN with the entangled code and its variation with both entangled
and disentangled codes. This example is shown in the context of celebrity face generation.
Looking at preceding figure, it appears that GANs with disentangled representations
can also be optimized in the same way as a vanilla GAN can be. This is because
the generator output can be represented as:
The code = ( z,
c)
( z,
c ) = ( z)
G G (Equation 6.1.1)
z is made of two elements:
1. Incompressible entangled noise code similar to GANs z or noise vector.
2. Latent codes, c 1
,c 2
,…,c L
, which represent the interpretable disentangled codes
of the data distribution. Collectively all latent codes are represented by c.
For simplicity, all the latent codes are assumed to be independent:
L
( , , , ) ( )
p c c c = ∏ =
p ci
… (Equation 6.1.2)
1 2 L i 1
G G is provided with both the incompressible
noise code and the latent codes. From the point of view of the generator, optimizing
z = ( z,
c)
is the same as optimizing z. The generator network will simply ignore
the constraint imposed by the disentangled codes when coming up with a solution.
The generator learns the distribution pg
( x | c) = pg
( x)
. This will practically defeat
the objective of disentangled representations.
The generator function x = ( z,
c) = ( z)
[ 163 ]