22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Sure, we can!

Attention

Here is a (not so) crazy idea: What if the decoder could choose one (or more) of the

encoder’s hidden states to use instead of being forced to stick with only the final

one? That would surely give it more flexibility to use the hidden state that’s more

useful at a given step of the target-sequence generation.

Let’s illustrate it with a simple, non-numerical example: translating from English to

French using Google Translate. The original sentence is, "the European economic

zone," and its French translation is, "la zone économique européenne."

Now, let’s compare their first words: "La," in French, obviously corresponds to "the"

in English. My question to you is:

"Could Google (or any translator) have translated "the" to "la" without

any other information?"

The answer is: No. The English language has only one definite article—"the"—while

French (and many other languages) have many definite articles. It means that "the"

may be translated in many different ways, and it is only possible to determine the

correct (translated) article after finding the noun it refers to. The noun, in this case,

is zone, and it is the last word in the English sentence. Coincidentally, its translation

is also zone, and it is a singular feminine noun in French, thus making "la" the

correct translation of "the" in this case.

"So what? What does this have to do with hidden states?"

Well, if we consider the English sentence a (source) sequence of words, the French

sentence is a (target) sequence of words. Assuming we can map each word to a

numeric vector, we can use an encoder to encode the words in English, each word

corresponding to a hidden state.

We know that the decoder’s role is to generate the translated words, right? Then,

if the decoder is allowed to choose which hidden states from the encoder it will

use to generate each output, it means it can choose which English words it will use

to generate each translated word.

We have already seen that, in order to translate "the" to "la," the translator (that

Attention | 705

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!