Advanced Deep Learning with Keras

fourpersent2020
from fourpersent2020 More from this publisher
16.03.2021 Views

Chapter 4Figure 4.1.3: Training the discriminator is similar to training a binary classifier network using binarycross-entropy loss. The fake data is supplied by the generator while real data is from true samples.As shown in the preceding figure, the discriminator can be trained by minimizingthe loss function in the following equation:( θ , θ ) =−Ex~plog ( x) −Ezlog( 1−( ( z)))( D) ( G) ( D)L D D G (Equation 4.1.1)dataThe equation is just the standard binary cross-entropy cost function. The loss isthe negative sum of the expectation of correctly identifying real data, D ( x), andthe expectation of 1.0 minus correctly identifying synthetic data, 1−D( G ( z)). Thelog does not change the location of the local minima. Two mini-batches of data aresupplied to the discriminator during training:1. x , real from sampled data (that is, x ~ p data) with label 1.0x′ = G z , fake data from the generator with label 0.02. ( )In order to minimize the loss function, the discriminator parameters, θ , will beupdated through backpropagation by correctly identifying the genuine data, D( x), and synthetic data, 1−D( G ( z)). Correctly identifying real data is equivalentto D ( x) →1.0while correctly classifying fake data is the same as D( G ( z)) → 0.0or ( 1−D( G ( z)))→1.0. In this equation, z is the arbitrary encoding or noisevector that is used by the generator to synthesize new signals. Both contributeto minimizing the loss function.To train the generator, GAN considers the total of the discriminator and generatorlosses as a zero-sum game. The generator loss function is simply the negative of thediscriminator loss function:( ) ( ) ( )( θ , θ ) =− ( θ , θ )( G ) ( G ) ( D )D G DL L (Equation 4.1.2)[ 103 ]( D)

Generative Adversarial Networks (GANs)This can then be rewritten more aptly as a value function:( ) ( ) ( )( θ , θ ) =− ( θ , θ )( G ) ( G ) ( D )D G DV L (Equation 4.1.3)From the perspective of the generator, Equation 4.1.3 should be minimized. Fromthe point of view of the discriminator, the value function should be maximized.Therefore, the generator training criterion can be written as a minimax problem:( G)( D) ( G) ( D)( G) ( D)θ θ( )θ ∗ = arg minmin V θ , θ(Equation 4.1.4)Occasionally, we'll try to fool the discriminator by pretending that the synthetic( D)data is real with label 1.0. By maximizing with respect to θ , the optimizer sendsgradient updates to the discriminator parameters to consider this synthetic data asreal. At the same time, by minimizing with respect to( G)θ , the optimizer will trainthe generator's parameters on how to trick the discriminator. However, in practice,the discriminator is confident in its prediction in classifying the synthetic data as fakeand will not update its parameters. Furthermore, the gradient updates are small andhave diminished significantly as they propagate to the generator layers. As a result,the generator fails to converge:Figure 4.1.4: Training the generator is like training a network using a binary cross-entropy loss function.The fake data from the generator is presented as genuine.The solution is to reformulate the loss function of the generator in the form:( θ , θ ) =−E zlog ( z)( G ) ( G ) ( D )( )L D G (Equation 4.1.5)[ 104 ]

Chapter 4

Figure 4.1.3: Training the discriminator is similar to training a binary classifier network using binary

cross-entropy loss. The fake data is supplied by the generator while real data is from true samples.

As shown in the preceding figure, the discriminator can be trained by minimizing

the loss function in the following equation:

( θ , θ ) =−E

x~

p

log ( x) −Ez

log( 1−

( ( z)

))

( D) ( G) ( D)

L D D G (Equation 4.1.1)

data

The equation is just the standard binary cross-entropy cost function. The loss is

the negative sum of the expectation of correctly identifying real data, D ( x)

, and

the expectation of 1.0 minus correctly identifying synthetic data, 1−D( G ( z)

). The

log does not change the location of the local minima. Two mini-batches of data are

supplied to the discriminator during training:

1. x , real from sampled data (that is, x ~ p data

) with label 1.0

x′ = G z , fake data from the generator with label 0.0

2. ( )

In order to minimize the loss function, the discriminator parameters, θ , will be

updated through backpropagation by correctly identifying the genuine data, D( x)

, and synthetic data, 1−D( G ( z)

). Correctly identifying real data is equivalent

to D ( x) →1.0

while correctly classifying fake data is the same as D( G ( z)

) → 0.0

or ( 1−D( G ( z)

))

→1.0

. In this equation, z is the arbitrary encoding or noise

vector that is used by the generator to synthesize new signals. Both contribute

to minimizing the loss function.

To train the generator, GAN considers the total of the discriminator and generator

losses as a zero-sum game. The generator loss function is simply the negative of the

discriminator loss function:

( ) ( ) ( )

( θ , θ ) =− ( θ , θ )

( G ) ( G ) ( D )

D G D

L L (Equation 4.1.2)

[ 103 ]

( D)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!