pdfcoffee

soumyasankar99
from soumyasankar99 More from this publisher
09.05.2023 Views

Chapter 6Another interesting paper is Semantic Image Inpainting with Perceptual and ContextualLosses, by Raymond A. Yeh et al. in 2016. Just as content-aware fill is a tool used byphotographers to fill in unwanted or missing part of images, in this paper they useda DCGAN for image completion.As mentioned earlier, a lot of research is happening around GANs. In the nextsection we will explore some of the interesting GAN architectures proposed inrecent years.Some interesting GAN architecturesSince their inception a lot of interest has been generated in GANs, and as a resultwe are seeing a lot of modifications and experimentation with GAN training,architecture, and applications. In this section we will explore some interestingGANs proposed in recent years.SRGANRemember seeing a crime-thriller where our hero asks the computer-guy to magnifythe faded image of the crime scene? With the zoom we are able to see the criminal'sface in detail, including the weapon used and anything engraved upon it! Well,Super Resolution GANs (SRGANs) can perform similar magic.Here a GAN is trained in such a way that it can generate a photorealistic highresolutionimage when given a low-resolution image. The SRGAN architectureconsists of three neural networks: a very deep generator network (which usesResidual modules; for reference see ResNets in Chapter 5, Advanced ConvolutionalNeural Networks), a discriminator network, and a pretrained VGG-16 network.SRGANs use the perceptual loss function (developed by Johnson et al, you can findthe link to the paper in the References section). The difference in the feature mapactivations in high layers of a VGG network between the network output part andthe high-resolution part comprises the perceptual loss function. Besides perceptualloss, the authors further added content loss and an adversarial loss so that imagesgenerated look more natural and the finer details more artistic. The perceptualloss is defined as the weighted sum of content loss and adversarial loss:ll SSSS = ll SSSS XX + 10 −3 SSSS× ll GGGGGGThe first term on the right-hand side is the content loss, obtained using the featuremaps generated by pretrained VGG 19. Mathematically it is the Euclidean distancebetween the feature map of the reconstructed image (that is, the one generated bythe generator) and the original high-resolution reference image.[ 209 ]

Generative Adversarial NetworksThe second term on the RHS is the adversarial loss. It is the standard generative lossterm, designed to ensure that images generated by the generator are able to fool thediscriminator. You can see in the following figure taken from the original paper thatthe image generated by the SRGAN is much closer to the original high-resolutionimage:Another noteworthy architecture is CycleGAN; proposed in 2017, it can perform thetask of image translation. Once trained you can translate an image from one domainto another domain. For example, when trained on a horse and zebra dataset, if yougive it an image with horses in the foreground, the CycleGAN can convert the horsesto zebra with the same background. We explore it next.CycleGANHave you ever imagined how some scenery would look if Van Gogh or Manethad painted it? We have many scenes and landscapes painted by Gogh/Manet, butwe do not have any collection of input-output pairs. A CycleGAN performsthe image translation, that is, transfers an image given in one domain (sceneryfor example) to another domain (a Van Gogh painting of the same scene, forinstance) in the absence of training examples. The CycleGAN's ability to performimage translation in the absence of training pairs is what makes it unique.To achieve image translation the authors used a very simple and yet effectiveprocedure. They made use of two GANs, the generator of each GAN performing theimage translation from one domain to another.[ 210 ]

Generative Adversarial Networks

The second term on the RHS is the adversarial loss. It is the standard generative loss

term, designed to ensure that images generated by the generator are able to fool the

discriminator. You can see in the following figure taken from the original paper that

the image generated by the SRGAN is much closer to the original high-resolution

image:

Another noteworthy architecture is CycleGAN; proposed in 2017, it can perform the

task of image translation. Once trained you can translate an image from one domain

to another domain. For example, when trained on a horse and zebra dataset, if you

give it an image with horses in the foreground, the CycleGAN can convert the horses

to zebra with the same background. We explore it next.

CycleGAN

Have you ever imagined how some scenery would look if Van Gogh or Manet

had painted it? We have many scenes and landscapes painted by Gogh/Manet, but

we do not have any collection of input-output pairs. A CycleGAN performs

the image translation, that is, transfers an image given in one domain (scenery

for example) to another domain (a Van Gogh painting of the same scene, for

instance) in the absence of training examples. The CycleGAN's ability to perform

image translation in the absence of training pairs is what makes it unique.

To achieve image translation the authors used a very simple and yet effective

procedure. They made use of two GANs, the generator of each GAN performing the

image translation from one domain to another.

[ 210 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!