09.05.2023 Views

pdfcoffee

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 6

Another interesting paper is Semantic Image Inpainting with Perceptual and Contextual

Losses, by Raymond A. Yeh et al. in 2016. Just as content-aware fill is a tool used by

photographers to fill in unwanted or missing part of images, in this paper they used

a DCGAN for image completion.

As mentioned earlier, a lot of research is happening around GANs. In the next

section we will explore some of the interesting GAN architectures proposed in

recent years.

Some interesting GAN architectures

Since their inception a lot of interest has been generated in GANs, and as a result

we are seeing a lot of modifications and experimentation with GAN training,

architecture, and applications. In this section we will explore some interesting

GANs proposed in recent years.

SRGAN

Remember seeing a crime-thriller where our hero asks the computer-guy to magnify

the faded image of the crime scene? With the zoom we are able to see the criminal's

face in detail, including the weapon used and anything engraved upon it! Well,

Super Resolution GANs (SRGANs) can perform similar magic.

Here a GAN is trained in such a way that it can generate a photorealistic highresolution

image when given a low-resolution image. The SRGAN architecture

consists of three neural networks: a very deep generator network (which uses

Residual modules; for reference see ResNets in Chapter 5, Advanced Convolutional

Neural Networks), a discriminator network, and a pretrained VGG-16 network.

SRGANs use the perceptual loss function (developed by Johnson et al, you can find

the link to the paper in the References section). The difference in the feature map

activations in high layers of a VGG network between the network output part and

the high-resolution part comprises the perceptual loss function. Besides perceptual

loss, the authors further added content loss and an adversarial loss so that images

generated look more natural and the finer details more artistic. The perceptual

loss is defined as the weighted sum of content loss and adversarial loss:

ll SSSS = ll SSSS XX + 10 −3 SSSS

× ll GGGGGG

The first term on the right-hand side is the content loss, obtained using the feature

maps generated by pretrained VGG 19. Mathematically it is the Euclidean distance

between the feature map of the reconstructed image (that is, the one generated by

the generator) and the original high-resolution reference image.

[ 209 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!