Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

peiying410632
from peiying410632 More from this publisher
22.02.2024 Views

Every gate worthy of its name will use a sigmoid activationfunction to produce gate-compatible values between zero andone.Moreover, since all components of a GRU (n, r, and z) share a similar structure, itshould be no surprise that its corresponding transformations (t h and t x ) are alsosimilarly computed:Equation 8.7 - Transformations of a GRUSee? They all follow the same logic! Actually, let’s literally see how all thesecomponents are connected in the following diagram.Gated Recurrent Units (GRUs) | 627

Figure 8.18 - Internals of a GRU cellThe gates are following the same color convention I used in the equations: red forthe reset gate (r) and blue for the update gate (z). The path of the (new) candidatehidden state (n) is drawn in black and joins the (old) hidden state (h), drawn in gray,to produce the actual new hidden state (h').To really understand the flow of information inside the GRU cell, I suggest you trythese exercises:• First, learn to look past (or literally ignore) the internals of the gates: both r andz are simply values between zero and one (for each hidden dimension).• Pretend r=1; can you see that the resulting n is equivalent to the output of asimple RNN?• Keep r=1, and now pretend z=0; can you see that the new hidden state h' isequivalent to the output of a simple RNN?• Now pretend z=1; can you see that the new hidden state h' is simply a copy ofthe old hidden state (in other words, the data [x] does not have any effect)?• If you decrease r all the way to zero, the resulting n is less and less influencedby the old hidden state.• If you decrease z all the way to zero, the new hidden state h' is closer andcloser to n.628 | Chapter 8: Sequences

Every gate worthy of its name will use a sigmoid activation

function to produce gate-compatible values between zero and

one.

Moreover, since all components of a GRU (n, r, and z) share a similar structure, it

should be no surprise that its corresponding transformations (t h and t x ) are also

similarly computed:

Equation 8.7 - Transformations of a GRU

See? They all follow the same logic! Actually, let’s literally see how all these

components are connected in the following diagram.

Gated Recurrent Units (GRUs) | 627

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!