Advanced Deep Learning with Keras
Chapter 4Figure 4.1.3: Training the discriminator is similar to training a binary classifier network using binarycross-entropy loss. The fake data is supplied by the generator while real data is from true samples.As shown in the preceding figure, the discriminator can be trained by minimizingthe loss function in the following equation:( θ , θ ) =−Ex~plog ( x) −Ezlog( 1−( ( z)))( D) ( G) ( D)L D D G (Equation 4.1.1)dataThe equation is just the standard binary cross-entropy cost function. The loss isthe negative sum of the expectation of correctly identifying real data, D ( x), andthe expectation of 1.0 minus correctly identifying synthetic data, 1−D( G ( z)). Thelog does not change the location of the local minima. Two mini-batches of data aresupplied to the discriminator during training:1. x , real from sampled data (that is, x ~ p data) with label 1.0x′ = G z , fake data from the generator with label 0.02. ( )In order to minimize the loss function, the discriminator parameters, θ , will beupdated through backpropagation by correctly identifying the genuine data, D( x), and synthetic data, 1−D( G ( z)). Correctly identifying real data is equivalentto D ( x) →1.0while correctly classifying fake data is the same as D( G ( z)) → 0.0or ( 1−D( G ( z)))→1.0. In this equation, z is the arbitrary encoding or noisevector that is used by the generator to synthesize new signals. Both contributeto minimizing the loss function.To train the generator, GAN considers the total of the discriminator and generatorlosses as a zero-sum game. The generator loss function is simply the negative of thediscriminator loss function:( ) ( ) ( )( θ , θ ) =− ( θ , θ )( G ) ( G ) ( D )D G DL L (Equation 4.1.2)[ 103 ]( D)
Generative Adversarial Networks (GANs)This can then be rewritten more aptly as a value function:( ) ( ) ( )( θ , θ ) =− ( θ , θ )( G ) ( G ) ( D )D G DV L (Equation 4.1.3)From the perspective of the generator, Equation 4.1.3 should be minimized. Fromthe point of view of the discriminator, the value function should be maximized.Therefore, the generator training criterion can be written as a minimax problem:( G)( D) ( G) ( D)( G) ( D)θ θ( )θ ∗ = arg minmin V θ , θ(Equation 4.1.4)Occasionally, we'll try to fool the discriminator by pretending that the synthetic( D)data is real with label 1.0. By maximizing with respect to θ , the optimizer sendsgradient updates to the discriminator parameters to consider this synthetic data asreal. At the same time, by minimizing with respect to( G)θ , the optimizer will trainthe generator's parameters on how to trick the discriminator. However, in practice,the discriminator is confident in its prediction in classifying the synthetic data as fakeand will not update its parameters. Furthermore, the gradient updates are small andhave diminished significantly as they propagate to the generator layers. As a result,the generator fails to converge:Figure 4.1.4: Training the generator is like training a network using a binary cross-entropy loss function.The fake data from the generator is presented as genuine.The solution is to reformulate the loss function of the generator in the form:( θ , θ ) =−E zlog ( z)( G ) ( G ) ( D )( )L D G (Equation 4.1.5)[ 104 ]
- Page 69 and 70: Deep Neural NetworksGenerally speak
- Page 71 and 72: Deep Neural NetworksIn the dataset,
- Page 73 and 74: Deep Neural NetworksTransition Laye
- Page 75 and 76: Deep Neural NetworksThere are some
- Page 77 and 78: Deep Neural NetworksResNet v2 is al
- Page 79 and 80: Deep Neural Networks…if version =
- Page 81 and 82: Deep Neural NetworksTo prevent the
- Page 83 and 84: Deep Neural NetworksAverage Pooling
- Page 85 and 86: Deep Neural Networks# orig paper us
- Page 88 and 89: AutoencodersIn the previous chapter
- Page 90 and 91: Chapter 3The autoencoder has the te
- Page 92 and 93: Chapter 3Firstly, we're going to im
- Page 94 and 95: Chapter 3# reconstruct the inputout
- Page 96 and 97: Chapter 3Figure 3.2.2: The decoder
- Page 98 and 99: batch_size=32,model_name="autoencod
- Page 100 and 101: Chapter 3Figure 3.2.6: Digits gener
- Page 102 and 103: Chapter 3As shown in Figure 3.3.2,
- Page 104 and 105: Chapter 3image_size = x_train.shape
- Page 106 and 107: Chapter 3# Mean Square Error (MSE)
- Page 108 and 109: Chapter 3from keras.layers import R
- Page 110 and 111: Chapter 3# build the autoencoder mo
- Page 112 and 113: Chapter 3x_train,validation_data=(x
- Page 114: Chapter 3ConclusionIn this chapter,
- Page 117 and 118: Generative Adversarial Networks (GA
- Page 119: Generative Adversarial Networks (GA
- Page 123 and 124: Generative Adversarial Networks (GA
- Page 125 and 126: Generative Adversarial Networks (GA
- Page 127 and 128: Generative Adversarial Networks (GA
- Page 129 and 130: Generative Adversarial Networks (GA
- Page 131 and 132: Generative Adversarial Networks (GA
- Page 133 and 134: Generative Adversarial Networks (GA
- Page 135 and 136: Generative Adversarial Networks (GA
- Page 137 and 138: Generative Adversarial Networks (GA
- Page 139 and 140: Generative Adversarial Networks (GA
- Page 141 and 142: Generative Adversarial Networks (GA
- Page 143 and 144: Improved GANsIn summary, the goal o
- Page 145 and 146: Improved GANsThe intuition behind E
- Page 147 and 148: Improved GANsThis makes sense since
- Page 149 and 150: Improved GANsIn the context of GANs
- Page 151 and 152: Improved GANsFigure 5.1.3: Top: Tra
- Page 153 and 154: Improved GANsThe functions include:
- Page 155 and 156: Improved GANsmodels = (generator, d
- Page 157 and 158: Improved GANsfor layer in discrimin
- Page 159 and 160: Improved GANsFollowing figure shows
- Page 161 and 162: Improved GANsThe preceding table sh
- Page 163 and 164: Improved GANsFollowing figure shows
- Page 165 and 166: Improved GANsEssentially, in CGAN w
- Page 167 and 168: Improved GANslayer = Dense(layer_fi
- Page 169 and 170: Improved GANsx = BatchNormalization
Chapter 4
Figure 4.1.3: Training the discriminator is similar to training a binary classifier network using binary
cross-entropy loss. The fake data is supplied by the generator while real data is from true samples.
As shown in the preceding figure, the discriminator can be trained by minimizing
the loss function in the following equation:
( θ , θ ) =−E
x~
p
log ( x) −Ez
log( 1−
( ( z)
))
( D) ( G) ( D)
L D D G (Equation 4.1.1)
data
The equation is just the standard binary cross-entropy cost function. The loss is
the negative sum of the expectation of correctly identifying real data, D ( x)
, and
the expectation of 1.0 minus correctly identifying synthetic data, 1−D( G ( z)
). The
log does not change the location of the local minima. Two mini-batches of data are
supplied to the discriminator during training:
1. x , real from sampled data (that is, x ~ p data
) with label 1.0
x′ = G z , fake data from the generator with label 0.0
2. ( )
In order to minimize the loss function, the discriminator parameters, θ , will be
updated through backpropagation by correctly identifying the genuine data, D( x)
, and synthetic data, 1−D( G ( z)
). Correctly identifying real data is equivalent
to D ( x) →1.0
while correctly classifying fake data is the same as D( G ( z)
) → 0.0
or ( 1−D( G ( z)
))
→1.0
. In this equation, z is the arbitrary encoding or noise
vector that is used by the generator to synthesize new signals. Both contribute
to minimizing the loss function.
To train the generator, GAN considers the total of the discriminator and generator
losses as a zero-sum game. The generator loss function is simply the negative of the
discriminator loss function:
( ) ( ) ( )
( θ , θ ) =− ( θ , θ )
( G ) ( G ) ( D )
D G D
L L (Equation 4.1.2)
[ 103 ]
( D)