09.05.2023 Views

pdfcoffee

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Autoencoders

First, we extract the encoder component into its own network:

encoder = Model(autoencoder.input, autoencoder.get_layer("encoder_

lstm").output)

Then we run the autoencoder on the test set to return the predicted embeddings.

We then send both the input embedding and the predicted embedding through the

encoder to produce sentence vectors from each and compare the two vectors using

cosine similarity. Cosine similarities close to "one" indicate high similarity and those

close to "zero" indicate low similarity. The following code runs against a random

subset of 500 test sentences and produces some sample values of cosine similarities

between the sentence vectors generated from the source embedding and the

corresponding target embedding produced by the autoencoder:

def compute_cosine_similarity(x, y):

return np.dot(x, y) / (np.linalg.norm(x, 2) * np.linalg.norm(y,

2))

k = 500

cosims = np.zeros((k))

i= 0

for bid in range(num_test_steps):

xtest, ytest = test_gen.next()

ytest_ = autoencoder.predict(xtest)

Xvec = encoder.predict(xtest)

Yvec = encoder.predict(ytest_)

for rid in range(Xvec.shape[0]):

if i >= k:

break

cosims[i] = compute_cosine_similarity(Xvec[rid], Yvec[rid])

if i <= 10:

print(cosims[i])

i += 1

if i >= k:

break

The first 10 values of cosine similarities are shown as follows. As we can see, the

vectors seem to be quite similar:

0.984686553478241

0.9815746545791626

0.9793671369552612

0.9805112481117249

0.9630994200706482

[ 372 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!