Advanced Deep Learning with Keras

fourpersent2020
from fourpersent2020 More from this publisher
16.03.2021 Views

Chapter 5WGAN ( D)L =− E D x + E D G( ) ( ( z))x~ p data w z w( G)L =−E zD w ( G( z))( − )w ← clip w, 0.01,0.015.1.215.1.225.1.20Algorithm 5.1.1 WGANTable 5.1.1: A comparison between the loss functions of GAN and WGANThe values of the parameters are α = 0.00005 , c = 0.01 m = 64, and n critic= 5.Require: a , the learning rate. c, the clipping parameter. m, the batch size. n critic, thenumber of the critic (discriminator) iterations per generator iteration.Require: w 0, initial critic (discriminator) parameters. θ0, initial generator parameters1. while θ has not converged do2. for t = 1, …, n criticdom(){ } ~1i3. Sample a batch xi=pdatafrom the real datam() i4. Sample a batch { z } ~ p( z)from the uniform noise distributioni=1()5.⎡ 1 m()( i 1 miw w i 1 w )i 1 w ( ( ⎤←∇ − += = θ ))⎢∑ D ∑ D G⎣ mm⎥⎦, compute thediscriminator gradients6. w ← w− α× RMSProp( w, gw), update the discriminator parameters7. w ← clip( w, − c,c), clip discriminator weights8. end form() i{ z } ~ p( z)i=11 m () iθ θ i 1 w θz=9. Sample a batch10.( ( ))from the uniform noise distributiong ←−∇ ∑ D G , compute the generator gradientsmθ ← θ − α× RMSProp θ,G , update generator parameters11. ( )12. end whileθ[ 133 ]

Improved GANsFigure 5.1.3: Top: Training the WGAN discriminator requires fake data from the generator and real data from thetrue distribution. Bottom: Training the WGAN generator requires fake data from the generator pretending to be real.Similar to GANs, WGAN alternately trains the discriminator and generator(through adversarial). However, in WGAN, the discriminator (also called the critic)trains n criticiterations (Lines 2 to 8) before training the generator for one iteration(Lines 9 to 11). This in contrast to GANs with an equal number of training iterationfor both discriminator and generator. Training the discriminator means learning theparameters (weights and biases) of the discriminator. This requires sampling a batchfrom the real data (Line 3) and a batch from the fake data (Line 4) and computingthe gradient of discriminator parameters (Line 5) after feeding the sampled datato the discriminator network. The discriminator parameters are optimized usingRMSProp (Line 6). Both lines 5 and 6 are the optimization of Equation 5.1.21.Adam was found to be unstable in WGAN.[ 134 ]

Chapter 5

WGAN ( D)

L =− E D x + E D G

( ) ( ( z)

)

x~ p data w z w

( G)

L =−E z

D w ( G( z)

)

( − )

w ← clip w, 0.01,0.01

5.1.21

5.1.22

5.1.20

Algorithm 5.1.1 WGAN

Table 5.1.1: A comparison between the loss functions of GAN and WGAN

The values of the parameters are α = 0.00005 , c = 0.01 m = 64, and n critic

= 5.

Require: a , the learning rate. c, the clipping parameter. m, the batch size. n critic

, the

number of the critic (discriminator) iterations per generator iteration.

Require: w 0

, initial critic (discriminator) parameters. θ

0

, initial generator parameters

1. while θ has not converged do

2. for t = 1, …, n critic

do

m

()

{ } ~

1

i

3. Sample a batch x

i=

pdata

from the real data

m

() i

4. Sample a batch { z } ~ p( z)

from the uniform noise distribution

i=

1

()

5.

⎡ 1 m

()

( i 1 m

i

w w i 1 w )

i 1 w ( ( ⎤

←∇ − +

= = θ ))

∑ D ∑ D G

⎣ m

m

⎥⎦

, compute the

discriminator gradients

6. w ← w− α× RMSProp( w, gw)

, update the discriminator parameters

7. w ← clip( w, − c,

c)

, clip discriminator weights

8. end for

m

() i

{ z } ~ p( z)

i=

1

1 m () i

θ θ i 1 w θ

z

=

9. Sample a batch

10.

( ( ))

from the uniform noise distribution

g ←−∇ ∑ D G , compute the generator gradients

m

θ ← θ − α× RMSProp θ,

G , update generator parameters

11. ( )

12. end while

θ

[ 133 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!