Advanced Deep Learning with Keras

fourpersent2020
from fourpersent2020 More from this publisher
16.03.2021 Views

Chapter 3The autoencoder has the tendency to memorize the input when the dimension of thelatent code is significantly bigger than x.A suitable loss function, ( , ̃ )L x x , is a measure of how dissimilar the input, x, fromthe output which is the recovered input, ̃x . As shown in the following equation,the Mean Squared Error (MSE) is an example of such a loss function:1 i = m i i∑( , ̃ ) = MSE = ( x − x̃) 2L x x (Equation 3.1.1)mi=1In this example, m is the output dimensions (For example, in MNIST m = width× height × channels = 28 × 28 × 1 = 784). xi and x̃iare the elements of x and x̃respectively. Since the loss function is a measure of dissimilarity between the inputand output, we're able to use alternative reconstruction loss functions such as thebinary cross entropy or structural similarity index (SSIM).Similar to the other neural networks, the autoencoder tries to make this erroror loss function as small as possible during training. Figure 3.1.1 shows theautoencoder. The encoder is a function that compresses the input, x, into a lowdimensionallatent vector, z. This latent vector represents the important featuresof the input distribution. The decoder then tries to recover the original input fromthe latent vector in the form of x̃ .Figure 3.1.1: Block diagram of an autoencoderFigure 3.1.2: An autoencoder with MNIST digit input and output. The latent vector is 16-dim.[ 73 ]

AutoencodersTo put the autoencoder in context, x can be an MNIST digit which has a dimension of28 × 28 × 1 = 784. The encoder transforms the input into a low-dimensional z that canbe a 16-dimension latent vector. The decoder will attempt to recover the input in theform of x̃ from z. Visually, every MNIST digit x will appear similar to x̃ . Figure 3.1.2demonstrates this autoencoding process to us. We can observe that the decoded digit7, while not exactly the same remains close enough.Since both encoder and decoder are non-linear functions, we can use neuralnetworks to implement both. For example, in the MNIST dataset, the autoencodercan be implemented by MLP or CNN. The autoencoder can be trained by minimizingthe loss function through backpropagation. Similar to other neural networks, theonly requirement is that the loss function must be differentiable.If we treat the input as a distribution, we can interpret the encoder as an encoder ofdistribution, p ( z | x ) and the decoder, as the decoder of distribution, p ( x | z ) . The lossfunction of the autoencoder is expressed as follows:( )L = −log p x | z (Equation 3.1.2)The loss function simply means that we would like to maximize the chances ofrecovering the input distribution given the latent vector distribution. If the decoderoutput distribution is assumed to be Gaussian, then the loss function boils downto MSE since:m m m2 2( z) ∏ ( ̃ ) ( ) ( ) 2i iσ ∑ ̃i iσ α∑̃i iL = − log p | = − log N x ; x , = − log N x ; x , x − xx (Equation 3.1.3)i=12In this example, ( xi; x̃i,σ )i= 1 i=1N represents a Gaussian distribution with a mean of x̃i and2variance of σ . A constant variance is assumed. The decoder output x̃ is assumed toibe independent. While m is the output dimension.Building autoencoders using KerasWe're now going to move onto something really exciting, building an autoencoderusing Keras library. For simplicity, we'll be using the MNIST dataset for the first setof examples. The autoencoder will then generate a latent vector from the input dataand recover the input using the decoder. The latent vector in this first example is16-dim.[ 74 ]

Chapter 3

The autoencoder has the tendency to memorize the input when the dimension of the

latent code is significantly bigger than x.

A suitable loss function, ( , ̃ )

L x x , is a measure of how dissimilar the input, x, from

the output which is the recovered input, ̃x . As shown in the following equation,

the Mean Squared Error (MSE) is an example of such a loss function:

1 i = m i i

( , ̃ ) = MSE = ( x − x̃

) 2

L x x (Equation 3.1.1)

m

i=

1

In this example, m is the output dimensions (For example, in MNIST m = width

× height × channels = 28 × 28 × 1 = 784). x

i and x̃

i

are the elements of x and x̃

respectively. Since the loss function is a measure of dissimilarity between the input

and output, we're able to use alternative reconstruction loss functions such as the

binary cross entropy or structural similarity index (SSIM).

Similar to the other neural networks, the autoencoder tries to make this error

or loss function as small as possible during training. Figure 3.1.1 shows the

autoencoder. The encoder is a function that compresses the input, x, into a lowdimensional

latent vector, z. This latent vector represents the important features

of the input distribution. The decoder then tries to recover the original input from

the latent vector in the form of x̃ .

Figure 3.1.1: Block diagram of an autoencoder

Figure 3.1.2: An autoencoder with MNIST digit input and output. The latent vector is 16-dim.

[ 73 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!