Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

peiying410632
from peiying410632 More from this publisher
22.02.2024 Views

Considering this, the not "unrolled" ("rolled" doesn’t sound right!) representation isa better characterization of the internal structure of an RNN.Let’s dive deeper into the internals of an RNN cell and look at it at the neuron level:Figure 8.7 - RNN cell at neuron levelSince one can choose the number of hidden dimensions, I chose two dimensions,simply because I want to be able to easily visualize the results. Hence, two blueneurons are transforming the hidden state.The number of red neurons transforming the data point willnecessarily be the same as the chosen number of hiddendimensions since both transformed outputs need to be addedtogether. But this doesn’t mean the data points must have thesame number of dimensions.Coincidentally, our data points have two coordinates, but even ifwe had 25 dimensions, these 25 features would still be mappedinto two dimensions by the two red neurons.The only operation left is the activation function, most likely the hyperbolictangent, which will produce the updated hidden state."Why hyperbolic tangent? Isn’t ReLU a better activation function?"The hyperbolic tangent has a "competitive advantage" here since it maps the featurespace to clearly defined boundaries: the interval (-1, 1). This guarantees that, atRecurrent Neural Networks (RNNs) | 595

every step of the sequence, the hidden state is always within these boundaries.Given that we have only one linear layer with which to transform the hiddenstate, regardless of which step of the sequence it is being used in, it is definitelyconvenient to have its values within a predictable range. We’ll get back to this inthe "Journey of a Hidden State" section.Now, let’s see how an RNN cell works in code. We’ll create one using PyTorch’sown nn.RNNCell and disassemble it into its components to manually reproduce allthe steps involved in updating the hidden state. To create a cell, we need to tell itthe input_size (number of features in our data points) and the hidden_size (thesize of the vector representing the hidden state). It is also possible to tell it not toadd biases, and to use a ReLU instead of TanH, but we’re sticking to the defaults.n_features = 2hidden_dim = 2torch.manual_seed(19)rnn_cell = nn.RNNCell(input_size=n_features, hidden_size=hidden_dim)rnn_state = rnn_cell.state_dict()rnn_stateOutputOrderedDict([('weight_ih', tensor([[ 0.6627, -0.4245],[ 0.5373, 0.2294]])),('weight_hh', tensor([[-0.4015, -0.5385],[-0.1956, -0.6835]])),('bias_ih', tensor([0.4954, 0.6533])),('bias_hh', tensor([-0.3565, -0.2904]))])The weight_ih and bias_ih (i stands for inputs—the data) tensors correspond tothe red neurons in Figure 8.7. The weight_hh and bias_hh (h stands for hidden)tensors, to the blue neurons. We can use these weights to create two linear layers:596 | Chapter 8: Sequences

every step of the sequence, the hidden state is always within these boundaries.

Given that we have only one linear layer with which to transform the hidden

state, regardless of which step of the sequence it is being used in, it is definitely

convenient to have its values within a predictable range. We’ll get back to this in

the "Journey of a Hidden State" section.

Now, let’s see how an RNN cell works in code. We’ll create one using PyTorch’s

own nn.RNNCell and disassemble it into its components to manually reproduce all

the steps involved in updating the hidden state. To create a cell, we need to tell it

the input_size (number of features in our data points) and the hidden_size (the

size of the vector representing the hidden state). It is also possible to tell it not to

add biases, and to use a ReLU instead of TanH, but we’re sticking to the defaults.

n_features = 2

hidden_dim = 2

torch.manual_seed(19)

rnn_cell = nn.RNNCell(input_size=n_features, hidden_size=hidden_dim)

rnn_state = rnn_cell.state_dict()

rnn_state

Output

OrderedDict([('weight_ih', tensor([[ 0.6627, -0.4245],

[ 0.5373, 0.2294]])),

('weight_hh', tensor([[-0.4015, -0.5385],

[-0.1956, -0.6835]])),

('bias_ih', tensor([0.4954, 0.6533])),

('bias_hh', tensor([-0.3565, -0.2904]))])

The weight_ih and bias_ih (i stands for inputs—the data) tensors correspond to

the red neurons in Figure 8.7. The weight_hh and bias_hh (h stands for hidden)

tensors, to the blue neurons. We can use these weights to create two linear layers:

596 | Chapter 8: Sequences

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!