Advanced Deep Learning with Keras
Chapter 5DivergenceKullback-Leibler (KL)5.1.1Jensen-Shannon (JS)5.1.2Earth-MoverDistance(EMD) orWasserstein 15.1.3Expressionp ( x)D ( || ) ~log dataKLpdata pg = Ex pdatap ( x)p≠ DKL ( pg || pdata)= Ex~ploggpdatagg( x)( x)1 pdata( x)1pg( x)D ( p p ) = E~log+ E~log= D p pg2 pdata ( x) + pg ( x)2 pdata ( x) + pg( x)2 2( )JS data g x pdata x p JS g data( )W p , p = inf E ⎡data g ( , )x−y ⎤~( pdata, pg)x y γγ∈∏⎣ ⎦∏ is the set of all joint distributions y(x,y) whosemarginal are p dataand p g.where ( pdata,pg)Table 5.1.1: The divergence functions between two probability distribution functions p dataand p gFigure 5.1.1: The EMD is the weighted amount of mass from x to be transportedin order to match the target distribution, y[ 127 ]
Improved GANsThe intuition behind EMD is that it is a measure of how much mass γ ( x,y)should be transported by d = ||x - y|| for the probability distribution p datain order to match the probability distribution p g. γ( x,y)is a joint distribution inthe space of all possible joint distributions ∏ ( pdata,pg). γ ( x,y)is also known asa transport plan to reflect the strategy for transporting masses to match the twoprobability distributions. There are many possible transport plans given the twoprobability distributions. Roughly speaking, inf indicates a transport plan withthe minimum cost.For example, Figure 5.1.1 shows us two simple discrete distributions x and y . xhas masses m ifor i = 1, 2, 3 and 4 at locations x ifor i = 1, 2, 3 and 4. Meanwhile yhas masses m ifor i =1 and 2 at locations y ifor i = 1 and 2. To match the distributiony , the arrows show the minimum transport plan to move each mass x iby d i. TheEMD is computed as:4∑ xidi0.2( 0.4) 0.3( 0.5) 0.1( 0.3) 0.4( 0.7)0.54 (Equation 5.1.4)EMD = = + + + =i=1In Figure 5.1.1, the EMD can be interpreted as the least amount of work needed tomove the pile of dirt x to fill the hole y . While in this example, the inf can also bededuced from the figure, in most cases especially in continuous distributions, it isintractable to exhaust all possible transport plans. We will come back to this problemlater on in this chapter. In the meantime, we'll show how the GAN loss functions are,in fact, minimizing the Jensen-Shannon (JS) divergence.Distance function in GANsWe're now going to compute the optimal discriminator given any generator fromthe loss function in the previous chapter. We'll recall the following equation:( D)( )( x) E ( ( z))L =−Ex~ plog D −zlog 1−D G (Equation 4.1.1)dataInstead of sampling from the noise distribution, the preceding equation can alsobe expressed as sampling from the generator distribution:( D)( x) E ( ( x))L =−Elog D − log 1−D (Equation 5.1.5)x~ pdatax~pgTo find the minimum( ) DL :[ 128 ]
- Page 94 and 95: Chapter 3# reconstruct the inputout
- Page 96 and 97: Chapter 3Figure 3.2.2: The decoder
- Page 98 and 99: batch_size=32,model_name="autoencod
- Page 100 and 101: Chapter 3Figure 3.2.6: Digits gener
- Page 102 and 103: Chapter 3As shown in Figure 3.3.2,
- Page 104 and 105: Chapter 3image_size = x_train.shape
- Page 106 and 107: Chapter 3# Mean Square Error (MSE)
- Page 108 and 109: Chapter 3from keras.layers import R
- Page 110 and 111: Chapter 3# build the autoencoder mo
- Page 112 and 113: Chapter 3x_train,validation_data=(x
- Page 114: Chapter 3ConclusionIn this chapter,
- Page 117 and 118: Generative Adversarial Networks (GA
- Page 119 and 120: Generative Adversarial Networks (GA
- Page 121 and 122: Generative Adversarial Networks (GA
- Page 123 and 124: Generative Adversarial Networks (GA
- Page 125 and 126: Generative Adversarial Networks (GA
- Page 127 and 128: Generative Adversarial Networks (GA
- Page 129 and 130: Generative Adversarial Networks (GA
- Page 131 and 132: Generative Adversarial Networks (GA
- Page 133 and 134: Generative Adversarial Networks (GA
- Page 135 and 136: Generative Adversarial Networks (GA
- Page 137 and 138: Generative Adversarial Networks (GA
- Page 139 and 140: Generative Adversarial Networks (GA
- Page 141 and 142: Generative Adversarial Networks (GA
- Page 143: Improved GANsIn summary, the goal o
- Page 147 and 148: Improved GANsThis makes sense since
- Page 149 and 150: Improved GANsIn the context of GANs
- Page 151 and 152: Improved GANsFigure 5.1.3: Top: Tra
- Page 153 and 154: Improved GANsThe functions include:
- Page 155 and 156: Improved GANsmodels = (generator, d
- Page 157 and 158: Improved GANsfor layer in discrimin
- Page 159 and 160: Improved GANsFollowing figure shows
- Page 161 and 162: Improved GANsThe preceding table sh
- Page 163 and 164: Improved GANsFollowing figure shows
- Page 165 and 166: Improved GANsEssentially, in CGAN w
- Page 167 and 168: Improved GANslayer = Dense(layer_fi
- Page 169 and 170: Improved GANsx = BatchNormalization
- Page 171 and 172: Improved GANsdiscriminator.compile(
- Page 173 and 174: Improved GANssize=batch_size)real_i
- Page 175 and 176: Improved GANsUnlike CGAN, the sampl
- Page 177 and 178: Improved GANsConclusionIn this chap
- Page 179 and 180: Disentangled Representation GANsIn
- Page 181 and 182: Disentangled Representation GANsInf
- Page 183 and 184: Disentangled Representation GANsFol
- Page 185 and 186: Disentangled Representation GANs# A
- Page 187 and 188: Disentangled Representation GANsif
- Page 189 and 190: Disentangled Representation GANsLis
- Page 191 and 192: Disentangled Representation GANsdat
- Page 193 and 194: Disentangled Representation GANsy[b
Improved GANs
The intuition behind EMD is that it is a measure of how much mass γ ( x,
y)
should be transported by d = ||x - y|| for the probability distribution p data
in order to match the probability distribution p g
. γ( x,
y)
is a joint distribution in
the space of all possible joint distributions ∏ ( pdata,
pg
). γ ( x,
y)
is also known as
a transport plan to reflect the strategy for transporting masses to match the two
probability distributions. There are many possible transport plans given the two
probability distributions. Roughly speaking, inf indicates a transport plan with
the minimum cost.
For example, Figure 5.1.1 shows us two simple discrete distributions x and y . x
has masses m i
for i = 1, 2, 3 and 4 at locations x i
for i = 1, 2, 3 and 4. Meanwhile y
has masses m i
for i =1 and 2 at locations y i
for i = 1 and 2. To match the distribution
y , the arrows show the minimum transport plan to move each mass x i
by d i
. The
EMD is computed as:
4
∑ xidi
0.2( 0.4) 0.3( 0.5) 0.1( 0.3) 0.4( 0.7)
0.54 (Equation 5.1.4)
EMD = = + + + =
i=
1
In Figure 5.1.1, the EMD can be interpreted as the least amount of work needed to
move the pile of dirt x to fill the hole y . While in this example, the inf can also be
deduced from the figure, in most cases especially in continuous distributions, it is
intractable to exhaust all possible transport plans. We will come back to this problem
later on in this chapter. In the meantime, we'll show how the GAN loss functions are,
in fact, minimizing the Jensen-Shannon (JS) divergence.
Distance function in GANs
We're now going to compute the optimal discriminator given any generator from
the loss function in the previous chapter. We'll recall the following equation:
( D)
( )
( x) E ( ( z)
)
L =−E
x~ p
log D −
z
log 1−D G (Equation 4.1.1)
data
Instead of sampling from the noise distribution, the preceding equation can also
be expressed as sampling from the generator distribution:
( D)
( x) E ( ( x)
)
L =−E
log D − log 1−D (Equation 5.1.5)
x~ pdata
x~
pg
To find the minimum
( ) D
L :
[ 128 ]