Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

peiying410632
from peiying410632 More from this publisher
22.02.2024 Views

Let’s take a look at the RNN’s arguments:• input_size: It is the number of features in each data point of the sequence.• hidden_size: It is the number of hidden dimensions you want to use.• bias: Just like any other layer, it includes the bias in the equations.• nonlinearity: By default, it uses the hyperbolic tangent, but you can change itto ReLU if you want.The four arguments above are exactly the same as those in the RNN cell. So, we caneasily create a full-fledged RNN like that:n_features = 2hidden_dim = 2torch.manual_seed(19)rnn = nn.RNN(input_size=n_features, hidden_size=hidden_dim)rnn.state_dict()OutputOrderedDict([('weight_ih_l0', tensor([[ 0.6627, -0.4245],[ 0.5373, 0.2294]])),('weight_hh_l0', tensor([[-0.4015, -0.5385],[-0.1956, -0.6835]])),('bias_ih_l0', tensor([0.4954, 0.6533])),('bias_hh_l0', tensor([-0.3565, -0.2904]))])Since the seed is exactly the same, you’ll notice that the weights and biases haveexactly the same values as our former RNN cell. The only difference is in theparameters' names: Now they all have an _l0 suffix to indicate they belong to thefirst "layer.""What do you mean by layer? Isn’t the RNN itself a layer?"Yes, the RNN itself can be a layer in our model. But it may have its own internallayers! You can configure those with the following four extra arguments:• num_layers: The RNN we’ve been using so far has one layer (the default value),Recurrent Neural Networks (RNNs) | 603

but if you use more than one, you’ll be creating a stacked RNN, which we’ll seein its own section.• bidirectional: So far, our RNNs have been handling sequences in the left-torightdirection (the default), but if you set this argument to True, you’ll becreating a bidirectional RNN, which we’ll also see in its own section.• dropout: This introduces an RNN’s own dropout layer between its internallayers, so it only makes sense if you’re using a stacked RNN.And I saved the best (actually, the worst) for last:• batch_first: The documentation says, "if True, then the input and output tensorsare provided as (batch, seq, feature)," which makes you think that you only needto set it to True and it will turn everything into your nice and familiar tensorswhere different batches are concatenated together as its first dimension—andyou’d be sorely mistaken."Why? What’s wrong with that?"The problem is, you need to read the documentation very literally: Only the inputand output tensors are going to be batch first; the hidden state will never be batchfirst. This behavior may bring complications you need to be aware of.ShapesBefore going through an example, let’s take a look at the expected inputs andoutputs of our RNN:• Inputs:◦ The input tensor containing the sequence you want to run through theRNN:▪ The default shape is sequence-first; that is, (sequence length, batchsize, number of features), which we’re abbreviating to (L, N, F).▪ But if you choose batch_first, it will flip the first two dimensions, andthen it will expect an (N, L, F) shape, which is what you’re likely gettingfrom a data loader.▪ By the way, the input can also be a packed sequence—we’ll get back tothat in a later section.604 | Chapter 8: Sequences

but if you use more than one, you’ll be creating a stacked RNN, which we’ll see

in its own section.

• bidirectional: So far, our RNNs have been handling sequences in the left-toright

direction (the default), but if you set this argument to True, you’ll be

creating a bidirectional RNN, which we’ll also see in its own section.

• dropout: This introduces an RNN’s own dropout layer between its internal

layers, so it only makes sense if you’re using a stacked RNN.

And I saved the best (actually, the worst) for last:

• batch_first: The documentation says, "if True, then the input and output tensors

are provided as (batch, seq, feature)," which makes you think that you only need

to set it to True and it will turn everything into your nice and familiar tensors

where different batches are concatenated together as its first dimension—and

you’d be sorely mistaken.

"Why? What’s wrong with that?"

The problem is, you need to read the documentation very literally: Only the input

and output tensors are going to be batch first; the hidden state will never be batch

first. This behavior may bring complications you need to be aware of.

Shapes

Before going through an example, let’s take a look at the expected inputs and

outputs of our RNN:

• Inputs:

◦ The input tensor containing the sequence you want to run through the

RNN:

▪ The default shape is sequence-first; that is, (sequence length, batch

size, number of features), which we’re abbreviating to (L, N, F).

▪ But if you choose batch_first, it will flip the first two dimensions, and

then it will expect an (N, L, F) shape, which is what you’re likely getting

from a data loader.

▪ By the way, the input can also be a packed sequence—we’ll get back to

that in a later section.

604 | Chapter 8: Sequences

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!