Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

peiying410632
from peiying410632 More from this publisher
22.02.2024 Views

coordinates of a "perfect" square and split it into source and target sequences:full_seq = (torch.tensor([[-1, -1], [-1, 1], [1, 1], [1, -1]]).float().view(1, 4, 2))source_seq = full_seq[:, :2] # first two cornerstarget_seq = full_seq[:, 2:] # last two cornersNow, let’s encode the source sequence and take the final hidden state:torch.manual_seed(21)encoder = Encoder(n_features=2, hidden_dim=2)hidden_seq = encoder(source_seq) # output is N, L, Fhidden_final = hidden_seq[:, -1:] # takes last hidden statehidden_finalOutputtensor([[[ 0.3105, -0.5263]]], grad_fn=<SliceBackward>)Of course, the model is untrained, so the final hidden state above is totally random.In a trained model, however, the final hidden state will encode information aboutthe source sequence. In Chapter 8, we used it to classify the direction in which thesquare was drawn, so it is safe to say that the final hidden state encoded thedrawing direction (clockwise or counterclockwise).Pretty straightforward, right? Now, let’s go over the…DecoderThe decoder’s goal is to generate the target sequence from aninitial representation; that is, to decode it.Sounds like a perfect match, doesn’t it? Encode the source sequence, get itsrepresentation (final hidden state), and feed it to the decoder so it generates thetarget sequence."How does the decoder transform a hidden state into a sequence?"Encoder-Decoder Architecture | 691

We can use recurrent layers for that as well.Figure 9.5 - DecoderLet’s analyze the figure above:• In the first step, the initial hidden state is the encoder’s final hidden state (h f ,in blue).• The first cell will output a new hidden state (h 2 ): That’s both the output of thatcell and one of the inputs of the next cell, as we’ve already seen in Chapter 8.• Before, we’d only run the final hidden state through a linear layer to produce thelogits, but now we’ll run the output of every cell through a linear layer (w T h) toconvert each hidden state into predicted coordinates (x 2 ).• The predicted coordinates are then used as one of the inputs of the secondstep (x 2 )."Great, but we’re missing one input in the first step, right?"That’s right! The first cell takes both an initial hidden state (h f , in blue, theencoder’s output) and a first data point (x 1 , in red).692 | Chapter 9 — Part I: Sequence-to-Sequence

We can use recurrent layers for that as well.

Figure 9.5 - Decoder

Let’s analyze the figure above:

• In the first step, the initial hidden state is the encoder’s final hidden state (h f ,

in blue).

• The first cell will output a new hidden state (h 2 ): That’s both the output of that

cell and one of the inputs of the next cell, as we’ve already seen in Chapter 8.

• Before, we’d only run the final hidden state through a linear layer to produce the

logits, but now we’ll run the output of every cell through a linear layer (w T h) to

convert each hidden state into predicted coordinates (x 2 ).

• The predicted coordinates are then used as one of the inputs of the second

step (x 2 ).

"Great, but we’re missing one input in the first step, right?"

That’s right! The first cell takes both an initial hidden state (h f , in blue, the

encoder’s output) and a first data point (x 1 , in red).

692 | Chapter 9 — Part I: Sequence-to-Sequence

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!