09.05.2023 Views

pdfcoffee

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 7

if i >= k-1:

break

if k < len(word_conf_pairs):

print("...")

print_most_similar(word_vectors.most_similar("king"), 5)

The most_similar() method with a single parameter produces the following

output. Here the floating point score is a measure of the similarity, higher values

being better than lower values. As you can see, the similar words seem to be

mostly accurate:

0.760 prince

0.701 queen

0.700 kings

0.698 emperor

0.688 throne

...

You can also do vector arithmetic similar to the country-capital example we

described earlier. Our objective is to see if the relation Paris : France :: Berlin :

Germany holds true. This is equivalent to saying that the distance in embedding

space between Paris and France should be the same as that between Berlin and

Germany. In other words, France - Paris + Berlin should give us Germany. In

code, then, this would translate to:

print_most_similar(word_vectors.most_similar(

positive=["france", "berlin"], negative=["paris"]), 1

)

This returns the following result, as expected:

0.803 germany

The preceding similarity value reported is Cosine similarity, but a better measure

of similarity was proposed by Levy and Goldberg [9] which is also implemented

in the gensim API:

print_most_similar(word_vectors.most_similar_cosmul(

positive=["france", "berlin"], negative=["paris"]), 1

)

And this also yields the expected result, but with higher similarity:

0.984 germany

[ 241 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!