22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

"Are we done now? Is this good enough?"

Sorry, but no, not yet. Our solution still has one problem, and it boils down to

computing distances between two encoded positions. Let’s take position number

three and its two neighbors, positions number two and four. Obviously, the

distance between position three and each of its closest neighbors is one. Now, let’s

see what happens if we compute distance using the positional encoding.

Figure 9.42 - Inconsistent distances

The distance between positions three and two (given by the norm of the difference

vector) is not equal to the distance between positions three and four. That may

seem a bit too abstract, but using an encoding with inconsistent distances would

make it much harder for the model to make sense out of the encoded positions.

This inconsistency arises from the fact that our encoding is resetting to zero every

time the module kicks in. The distance between positions three and four got much

larger because, at position four, the first vector goes back to zero. We need some

other function that has a smoother cycle.

"What if we actually use a cycle, I mean, a circle?"

Perfect! First, we take our encodings and multiply them by 360.

Figure 9.43 - From "normalized" module to degrees

Now, each value corresponds to a number of degrees that we can use to move

along a circle. The figure below shows a red arrow rotated by the corresponding

number of degrees for each position and base in the table above.

Positional Encoding (PE) | 767

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!