09.05.2023 Views

pdfcoffee

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 9

Keeping the rest of the code the same, and just changing the encoder, you can get

the Sparse autoencoder from the Vanilla autoencoder. The complete code for Sparse

autoencoder is in the Jupyter Notebook SparseAutoencoder.ipynb.

Alternatively, you can explicitly add a regularization term for sparsity in the

loss function. To do so you will need to implement the regularization for the sparsity

term as a function. If m is the total number of input patterns, then we can define

a quantity ρ_hat (you can check the mathematical details in Andrew Ng's lecture

here: https://web.stanford.edu/class/cs294a/sparseAutoencoder_2011new.

pdf), which measures the net activity (how many times on average it fires) for each

hidden layer unit. The basic idea is to put a constraint ρ_hat, such that it is equal

to the sparsity parameter ρ. This results in adding a regularization term for sparsity

in the loss function so that now the loss function becomes:

loss = Mean squared error + Regularization for sparsity parameter

This regularization term will penalize the network if ρ_hat deviates from ρ. One

standard way to do this is to use Kullback-Leiber (KL) divergence (you can learn

more about KL divergence from this interesting lecture: https://www.stat.cmu.

edu/~cshalizi/754/2006/notes/lecture-28.pdf) between ρ and ρ_hat.

Let's explore the KL divergence, D KL

, a little more. It is a non-symmetric measure

of the difference between the two distributions, in our case, ρ and ρ_hat. When ρ and

ρ_hat are equal then the difference is zero, otherwise it increases monotonically as

ρ_hat diverges from ρ. Mathematically, it is expressed as:

DD KKKK (ρρ ||ρρ̂jj) = ρρ llllll ρρ ρρ̂jj

+ (1 − ρρ)llllll 1 − ρρ

1 − ρρ̂jj

You add this to the loss to implicitly include the sparse term. You will need to fix a

constant value for the sparsity term ρ and compute ρ_hat using the encoder output.

The compact representation of the inputs is stored in weights. Let us visualize the

weights learned by the network. Following are the weights of the encoder layer for

the standard and Sparse autoencoder respectively.

[ 355 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!