16.03.2021 Views

Advanced Deep Learning with Keras

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 7

There are many more examples of this in different fields. In computer vision and

image processing, for example, we can perform the translation by inventing an

algorithm that extracts features from the source image to translate it into the target

image. Canny edge operator is an example of such an algorithm. However, in many

cases, the translation is very complex to hand-engineer that it is almost impossible

to find a suitable algorithm. Both the source and target domain distributions are

high-dimensional and complex:

Figure 7.1.2: Example of not aligned image pair: left, a photo of real sunflowers along University

Avenue, University of the Philippines and right, Sunflowers by Vincent Van Gogh at the National Gallery,

London, UK. Original photos were taken by the author.

A workaround on the image translation problem is to use deep learning techniques.

If we have a sufficiently large dataset from both the source and target domains, we

can train a neural network to model the translation. Since the images in the target

domain must be automatically generated given a source image, they must look like

real samples from the target domain. GANs are a suitable network for such crossdomain

tasks. The pix2pix [3] algorithm is an example of a cross-domain algorithm.

The pix2pix bears a resemblance to Conditional GAN (CGAN) [4] that we discussed

in Chapter 4, Generative Adversarial Networks (GANs). We can recall, that in conditional

GANs, on top of the noise input, z, a condition such as in the form of a one-hot vector

constrains the generator's output. For example, in the MNIST digit, if we want the

generator to output the digit 8, the condition is the one-hot vector [0, 0, 0, 0, 0, 0, 0, 0,

1, 0]. In pix2pix, the condition is the image to be translated. The generator's output is

the translated image. The pix2pix is trained by optimizing the conditional GAN loss.

To minimize blurring in the generated images, the L1 loss is also included.

[ 205 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!