pdfcoffee
Chapter 6Another interesting paper is Semantic Image Inpainting with Perceptual and ContextualLosses, by Raymond A. Yeh et al. in 2016. Just as content-aware fill is a tool used byphotographers to fill in unwanted or missing part of images, in this paper they useda DCGAN for image completion.As mentioned earlier, a lot of research is happening around GANs. In the nextsection we will explore some of the interesting GAN architectures proposed inrecent years.Some interesting GAN architecturesSince their inception a lot of interest has been generated in GANs, and as a resultwe are seeing a lot of modifications and experimentation with GAN training,architecture, and applications. In this section we will explore some interestingGANs proposed in recent years.SRGANRemember seeing a crime-thriller where our hero asks the computer-guy to magnifythe faded image of the crime scene? With the zoom we are able to see the criminal'sface in detail, including the weapon used and anything engraved upon it! Well,Super Resolution GANs (SRGANs) can perform similar magic.Here a GAN is trained in such a way that it can generate a photorealistic highresolutionimage when given a low-resolution image. The SRGAN architectureconsists of three neural networks: a very deep generator network (which usesResidual modules; for reference see ResNets in Chapter 5, Advanced ConvolutionalNeural Networks), a discriminator network, and a pretrained VGG-16 network.SRGANs use the perceptual loss function (developed by Johnson et al, you can findthe link to the paper in the References section). The difference in the feature mapactivations in high layers of a VGG network between the network output part andthe high-resolution part comprises the perceptual loss function. Besides perceptualloss, the authors further added content loss and an adversarial loss so that imagesgenerated look more natural and the finer details more artistic. The perceptualloss is defined as the weighted sum of content loss and adversarial loss:ll SSSS = ll SSSS XX + 10 −3 SSSS× ll GGGGGGThe first term on the right-hand side is the content loss, obtained using the featuremaps generated by pretrained VGG 19. Mathematically it is the Euclidean distancebetween the feature map of the reconstructed image (that is, the one generated bythe generator) and the original high-resolution reference image.[ 209 ]
Generative Adversarial NetworksThe second term on the RHS is the adversarial loss. It is the standard generative lossterm, designed to ensure that images generated by the generator are able to fool thediscriminator. You can see in the following figure taken from the original paper thatthe image generated by the SRGAN is much closer to the original high-resolutionimage:Another noteworthy architecture is CycleGAN; proposed in 2017, it can perform thetask of image translation. Once trained you can translate an image from one domainto another domain. For example, when trained on a horse and zebra dataset, if yougive it an image with horses in the foreground, the CycleGAN can convert the horsesto zebra with the same background. We explore it next.CycleGANHave you ever imagined how some scenery would look if Van Gogh or Manethad painted it? We have many scenes and landscapes painted by Gogh/Manet, butwe do not have any collection of input-output pairs. A CycleGAN performsthe image translation, that is, transfers an image given in one domain (sceneryfor example) to another domain (a Van Gogh painting of the same scene, forinstance) in the absence of training examples. The CycleGAN's ability to performimage translation in the absence of training pairs is what makes it unique.To achieve image translation the authors used a very simple and yet effectiveprocedure. They made use of two GANs, the generator of each GAN performing theimage translation from one domain to another.[ 210 ]
- Page 193 and 194: Advanced Convolutional Neural Netwo
- Page 195 and 196: Advanced Convolutional Neural Netwo
- Page 197 and 198: Advanced Convolutional Neural Netwo
- Page 199 and 200: Advanced Convolutional Neural Netwo
- Page 201 and 202: Advanced Convolutional Neural Netwo
- Page 203 and 204: Advanced Convolutional Neural Netwo
- Page 205 and 206: Advanced Convolutional Neural Netwo
- Page 207 and 208: Advanced Convolutional Neural Netwo
- Page 209 and 210: Advanced Convolutional Neural Netwo
- Page 211 and 212: Advanced Convolutional Neural Netwo
- Page 213 and 214: Advanced Convolutional Neural Netwo
- Page 215 and 216: Advanced Convolutional Neural Netwo
- Page 217 and 218: Advanced Convolutional Neural Netwo
- Page 219 and 220: Advanced Convolutional Neural Netwo
- Page 221 and 222: Advanced Convolutional Neural Netwo
- Page 223 and 224: Advanced Convolutional Neural Netwo
- Page 226 and 227: GenerativeAdversarial NetworksIn th
- Page 228 and 229: [ 193 ]Chapter 6Eventually, we reac
- Page 230 and 231: [ 195 ]Chapter 6Next, we combine th
- Page 232 and 233: Chapter 6And handwritten digits gen
- Page 234 and 235: Chapter 6Figure 1: Visualizing the
- Page 236 and 237: Chapter 6The resultant generator mo
- Page 238 and 239: Chapter 6Figure 4: A summary of res
- Page 240 and 241: Chapter 6def train(self, epochs, ba
- Page 242 and 243: Chapter 6The preceding images were
- Page 246 and 247: Chapter 6To elaborate, let us say t
- Page 248 and 249: Chapter 6Figure 7: The architecture
- Page 250 and 251: Chapter 6Figure 11: Illegible initi
- Page 252 and 253: Chapter 6Bedrooms: Generated bedroo
- Page 254 and 255: Chapter 6The images need to be norm
- Page 256 and 257: Chapter 6initializer = tf.random_no
- Page 258 and 259: Cool, right? Now we can define the
- Page 260 and 261: Chapter 6d_loss = (dA_loss + dB_los
- Page 262 and 263: Chapter 6generator_AB.save_weights(
- Page 264: 6. Ledig, Christian, et al. Photo-R
- Page 267 and 268: Word EmbeddingsDeep learning models
- Page 269 and 270: Word EmbeddingsFor example, "crucia
- Page 271 and 272: Word EmbeddingsAssuming a window si
- Page 273 and 274: Word EmbeddingsGloVeThe Global vect
- Page 275 and 276: Word Embeddingsgensim is an open so
- Page 277 and 278: Word Embeddingsgensim also provides
- Page 279 and 280: Word EmbeddingsSpecifically, we wil
- Page 281 and 282: Word EmbeddingsWe will also convert
- Page 283 and 284: Word EmbeddingsE = np.zeros((vocab_
- Page 285 and 286: Word Embeddingsx = self.embedding(x
- Page 287 and 288: Word EmbeddingsThe change in valida
- Page 289 and 290: Word EmbeddingsThe dataset is a 114
- Page 291 and 292: Word Embeddingsprint("random walks
- Page 293 and 294: Word Embeddingssize=128, # size of
Generative Adversarial Networks
The second term on the RHS is the adversarial loss. It is the standard generative loss
term, designed to ensure that images generated by the generator are able to fool the
discriminator. You can see in the following figure taken from the original paper that
the image generated by the SRGAN is much closer to the original high-resolution
image:
Another noteworthy architecture is CycleGAN; proposed in 2017, it can perform the
task of image translation. Once trained you can translate an image from one domain
to another domain. For example, when trained on a horse and zebra dataset, if you
give it an image with horses in the foreground, the CycleGAN can convert the horses
to zebra with the same background. We explore it next.
CycleGAN
Have you ever imagined how some scenery would look if Van Gogh or Manet
had painted it? We have many scenes and landscapes painted by Gogh/Manet, but
we do not have any collection of input-output pairs. A CycleGAN performs
the image translation, that is, transfers an image given in one domain (scenery
for example) to another domain (a Van Gogh painting of the same scene, for
instance) in the absence of training examples. The CycleGAN's ability to perform
image translation in the absence of training pairs is what makes it unique.
To achieve image translation the authors used a very simple and yet effective
procedure. They made use of two GANs, the generator of each GAN performing the
image translation from one domain to another.
[ 210 ]