Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

peiying410632
from peiying410632 More from this publisher
22.02.2024 Views

Figure 8.22 - Transforming the hidden stateGated Recurrent Units (GRUs) | 639

I’d like to draw your attention to the third column in particular: It clearly shows theeffect of a gate, the reset gate in this case, over the feature space. Since a gate hasa distinct value for each dimension, each dimension will shrink differently (it canonly shrink because values are always between zero and one). In the third row, forexample, the first dimension gets multiplied by 0.70, while the second dimensiongets multiplied by only 0.05, making the resulting feature space really small.Can We Do Better?The gated recurrent unit is definitely an improvement over the regular RNN, butthere are a couple of points I’d like to raise:• Using the reset gate inside the hyperbolic tangent seems "weird" (not ascientific argument at all, I know).• The best thing about the hidden state is that it is bounded by the hyperbolictangent—it guarantees the next cell will get the hidden state in the same range.• The worst thing about the hidden state is that it is bounded by the hyperbolictangent—it constrains the values the hidden state can take and, along withthem, the corresponding gradients.• Since we cannot have the cake and eat it too when it comes to the hidden statebeing bounded, what is preventing us from using two hidden states in thesame cell?Yes, let’s try that—two hidden states are surely better than one, right?By the way—I know that GRUs were invented a long time AFTERthe development of LSTMs, but I’ve decided to present them inorder of increasing complexity. Please don’t take the "story" I’mtelling too literally—it is just a way to facilitate learning.Long Short-Term Memory (LSTM)Long short-term memory, or LSTM for short, uses two states instead of one.Besides the regular hidden state (h), which is bounded by the hyperbolic tangent,as usual, it introduces a second cell state (c) as well, which is unbounded.So, let’s work through the points raised in the last section. First, let’s keep it simpleand use a regular RNN to generate a candidate hidden state (g):640 | Chapter 8: Sequences

I’d like to draw your attention to the third column in particular: It clearly shows the

effect of a gate, the reset gate in this case, over the feature space. Since a gate has

a distinct value for each dimension, each dimension will shrink differently (it can

only shrink because values are always between zero and one). In the third row, for

example, the first dimension gets multiplied by 0.70, while the second dimension

gets multiplied by only 0.05, making the resulting feature space really small.

Can We Do Better?

The gated recurrent unit is definitely an improvement over the regular RNN, but

there are a couple of points I’d like to raise:

• Using the reset gate inside the hyperbolic tangent seems "weird" (not a

scientific argument at all, I know).

• The best thing about the hidden state is that it is bounded by the hyperbolic

tangent—it guarantees the next cell will get the hidden state in the same range.

• The worst thing about the hidden state is that it is bounded by the hyperbolic

tangent—it constrains the values the hidden state can take and, along with

them, the corresponding gradients.

• Since we cannot have the cake and eat it too when it comes to the hidden state

being bounded, what is preventing us from using two hidden states in the

same cell?

Yes, let’s try that—two hidden states are surely better than one, right?

By the way—I know that GRUs were invented a long time AFTER

the development of LSTMs, but I’ve decided to present them in

order of increasing complexity. Please don’t take the "story" I’m

telling too literally—it is just a way to facilitate learning.

Long Short-Term Memory (LSTM)

Long short-term memory, or LSTM for short, uses two states instead of one.

Besides the regular hidden state (h), which is bounded by the hyperbolic tangent,

as usual, it introduces a second cell state (c) as well, which is unbounded.

So, let’s work through the points raised in the last section. First, let’s keep it simple

and use a regular RNN to generate a candidate hidden state (g):

640 | Chapter 8: Sequences

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!