Advanced Deep Learning with Keras

fourpersent2020
from fourpersent2020 More from this publisher
16.03.2021 Views

Chapter 5DivergenceKullback-Leibler (KL)5.1.1Jensen-Shannon (JS)5.1.2Earth-MoverDistance(EMD) orWasserstein 15.1.3Expressionp ( x)D ( || ) ~log dataKLpdata pg = Ex pdatap ( x)p≠ DKL ( pg || pdata)= Ex~ploggpdatagg( x)( x)1 pdata( x)1pg( x)D ( p p ) = E~log+ E~log= D p pg2 pdata ( x) + pg ( x)2 pdata ( x) + pg( x)2 2( )JS data g x pdata x p JS g data( )W p , p = inf E ⎡data g ( , )x−y ⎤~( pdata, pg)x y γγ∈∏⎣ ⎦∏ is the set of all joint distributions y(x,y) whosemarginal are p dataand p g.where ( pdata,pg)Table 5.1.1: The divergence functions between two probability distribution functions p dataand p gFigure 5.1.1: The EMD is the weighted amount of mass from x to be transportedin order to match the target distribution, y[ 127 ]

Improved GANsThe intuition behind EMD is that it is a measure of how much mass γ ( x,y)should be transported by d = ||x - y|| for the probability distribution p datain order to match the probability distribution p g. γ( x,y)is a joint distribution inthe space of all possible joint distributions ∏ ( pdata,pg). γ ( x,y)is also known asa transport plan to reflect the strategy for transporting masses to match the twoprobability distributions. There are many possible transport plans given the twoprobability distributions. Roughly speaking, inf indicates a transport plan withthe minimum cost.For example, Figure 5.1.1 shows us two simple discrete distributions x and y . xhas masses m ifor i = 1, 2, 3 and 4 at locations x ifor i = 1, 2, 3 and 4. Meanwhile yhas masses m ifor i =1 and 2 at locations y ifor i = 1 and 2. To match the distributiony , the arrows show the minimum transport plan to move each mass x iby d i. TheEMD is computed as:4∑ xidi0.2( 0.4) 0.3( 0.5) 0.1( 0.3) 0.4( 0.7)0.54 (Equation 5.1.4)EMD = = + + + =i=1In Figure 5.1.1, the EMD can be interpreted as the least amount of work needed tomove the pile of dirt x to fill the hole y . While in this example, the inf can also bededuced from the figure, in most cases especially in continuous distributions, it isintractable to exhaust all possible transport plans. We will come back to this problemlater on in this chapter. In the meantime, we'll show how the GAN loss functions are,in fact, minimizing the Jensen-Shannon (JS) divergence.Distance function in GANsWe're now going to compute the optimal discriminator given any generator fromthe loss function in the previous chapter. We'll recall the following equation:( D)( )( x) E ( ( z))L =−Ex~ plog D −zlog 1−D G (Equation 4.1.1)dataInstead of sampling from the noise distribution, the preceding equation can alsobe expressed as sampling from the generator distribution:( D)( x) E ( ( x))L =−Elog D − log 1−D (Equation 5.1.5)x~ pdatax~pgTo find the minimum( ) DL :[ 128 ]

Improved GANs

The intuition behind EMD is that it is a measure of how much mass γ ( x,

y)

should be transported by d = ||x - y|| for the probability distribution p data

in order to match the probability distribution p g

. γ( x,

y)

is a joint distribution in

the space of all possible joint distributions ∏ ( pdata,

pg

). γ ( x,

y)

is also known as

a transport plan to reflect the strategy for transporting masses to match the two

probability distributions. There are many possible transport plans given the two

probability distributions. Roughly speaking, inf indicates a transport plan with

the minimum cost.

For example, Figure 5.1.1 shows us two simple discrete distributions x and y . x

has masses m i

for i = 1, 2, 3 and 4 at locations x i

for i = 1, 2, 3 and 4. Meanwhile y

has masses m i

for i =1 and 2 at locations y i

for i = 1 and 2. To match the distribution

y , the arrows show the minimum transport plan to move each mass x i

by d i

. The

EMD is computed as:

4

∑ xidi

0.2( 0.4) 0.3( 0.5) 0.1( 0.3) 0.4( 0.7)

0.54 (Equation 5.1.4)

EMD = = + + + =

i=

1

In Figure 5.1.1, the EMD can be interpreted as the least amount of work needed to

move the pile of dirt x to fill the hole y . While in this example, the inf can also be

deduced from the figure, in most cases especially in continuous distributions, it is

intractable to exhaust all possible transport plans. We will come back to this problem

later on in this chapter. In the meantime, we'll show how the GAN loss functions are,

in fact, minimizing the Jensen-Shannon (JS) divergence.

Distance function in GANs

We're now going to compute the optimal discriminator given any generator from

the loss function in the previous chapter. We'll recall the following equation:

( D)

( )

( x) E ( ( z)

)

L =−E

x~ p

log D −

z

log 1−D G (Equation 4.1.1)

data

Instead of sampling from the noise distribution, the preceding equation can also

be expressed as sampling from the generator distribution:

( D)

( x) E ( ( x)

)

L =−E

log D − log 1−D (Equation 5.1.5)

x~ pdata

x~

pg

To find the minimum

( ) D

L :

[ 128 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!