Chapter 5WGAN ( D)L =− E D x + E D G( ) ( ( z))x~ p data w z w( G)L =−E zD w ( G( z))( − )w ← clip w, 0.01, 5.1.1 WGANTable 5.1.1: A comparison between the loss functions of GAN and WGANThe values of the parameters are α = 0.00005 , c = 0.01 m = 64, and n critic= 5.Require: a , the learning rate. c, the clipping parameter. m, the batch size. n critic, thenumber of the critic (discriminator) iterations per generator iteration.Require: w 0, initial critic (discriminator) parameters. θ0, initial generator parameters1. while θ has not converged do2. for t = 1, …, n criticdom(){ } ~1i3. Sample a batch xi=pdatafrom the real datam() i4. Sample a batch { z } ~ p( z)from the uniform noise distributioni=1()5.⎡ 1 m()( i 1 miw w i 1 w )i 1 w ( ( ⎤←∇ − += = θ ))⎢∑ D ∑ D G⎣ mm⎥⎦, compute thediscriminator gradients6. w ← w− α× RMSProp( w, gw), update the discriminator parameters7. w ← clip( w, − c,c), clip discriminator weights8. end form() i{ z } ~ p( z)i=11 m () iθ θ i 1 w θz=9. Sample a batch10.( ( ))from the uniform noise distributiong ←−∇ ∑ D G , compute the generator gradientsmθ ← θ − α× RMSProp θ,G , update generator parameters11. ( )12. end whileθ[ 133 ]

Improved GANsFigure 5.1.3: Top: Training the WGAN discriminator requires fake data from the generator and real data from thetrue distribution. Bottom: Training the WGAN generator requires fake data from the generator pretending to be real.Similar to GANs, WGAN alternately trains the discriminator and generator(through adversarial). However, in WGAN, the discriminator (also called the critic)trains n criticiterations (Lines 2 to 8) before training the generator for one iteration(Lines 9 to 11). This in contrast to GANs with an equal number of training iterationfor both discriminator and generator. Training the discriminator means learning theparameters (weights and biases) of the discriminator. This requires sampling a batchfrom the real data (Line 3) and a batch from the fake data (Line 4) and computingthe gradient of discriminator parameters (Line 5) after feeding the sampled datato the discriminator network. The discriminator parameters are optimized usingRMSProp (Line 6). Both lines 5 and 6 are the optimization of Equation 5.1.21.Adam was found to be unstable in WGAN.[ 134 ]

