09.05.2023 Views

pdfcoffee

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 9

You can see that the images are much clearer and sharper relative to the previous

autoencoders we have covered in this chapter. The code for this section is available

in the Jupyter notebook, ConvolutionAutoencoder.ipynb.

Keras autoencoder example ‒ sentence

vectors

In this example, we will build and train an LSTM-based autoencoder to generate

sentence vectors for documents in the Reuters-21578 corpus (https://archive.

ics.uci.edu/ml/datasets/reuters-21578+text+categorization+collecti

on). We have already seen in Chapter 7, Word Embeddings, how to represent a word

using word embeddings to create vectors that represent the word's meaning in the

context of other words it appears with. Here, we will see how to build similar vectors

for sentences. Sentences are sequences of words, so a sentence vector represents the

meaning of a sentence.

The easiest way to build a sentence vector is to just add up the word vectors and

divide by the number of words. However, this treats the sentence as a bag of words,

and does not take the order of words into account. Thus, the sentences The dog

bit the man and The man bit the dog would be treated as identical in this scenario.

LSTMs are designed to work with sequence input and do take the order of words

into consideration thus providing a better and more natural representation of the

sentence.

First, we import the necessary libraries:

from sklearn.model_selection import train_test_split

from tensorflow.keras.callbacks import ModelCheckpoint

from tensorflow.keras.layers import Input

from tensorflow.keras.layers import RepeatVector

from tensorflow.keras.layers import LSTM

from tensorflow.keras.layers import Bidirectional

from tensorflow.keras.models import Model

from tensorflow.keras.preprocessing import sequence

from scipy.stats import describe

[ 365 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!