Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

peiying410632
from peiying410632 More from this publisher
22.02.2024 Views

Model Configuration & TrainingModel Configuration1 torch.manual_seed(42)2 # Layers3 enclayer = EncoderLayer(n_heads=3, d_model=6,4 ff_units=10, dropout=0.1)5 declayer = DecoderLayer(n_heads=3, d_model=6,6 ff_units=10, dropout=0.1)7 # Encoder and Decoder8 enctransf = EncoderTransf(enclayer, n_layers=2)9 dectransf = DecoderTransf(declayer, n_layers=2)10 # Transformer11 model_transf = EncoderDecoderTransf(enctransf,12 dectransf,13 input_len=2,14 target_len=2,15 n_features=2)16 loss = nn.MSELoss()17 optimizer = torch.optim.Adam(model_transf.parameters(), lr=0.01)Weight Initialization1 for p in model_transf.parameters():2 if p.dim() > 1:3 nn.init.xavier_uniform_(p)Model Training1 sbs_seq_transf = StepByStep(model_transf, loss, optimizer)2 sbs_seq_transf.set_loaders(train_loader, test_loader)3 sbs_seq_transf.train(50)sbs_seq_transf.losses[-1], sbs_seq_transf.val_losses[-1]Output(0.019648547226097435, 0.011462601833045483)Putting It All Together | 875

RecapIn this chapter, we’ve extended the encoder-decoder architecture and transformedit into a Transformer (the last pun of the chapter; I couldn’t resist it!). First, wemodified the multi-headed attention mechanism to use narrow attention. Then,we introduced layer normalization and the need to change the dimensionality ofthe inputs using either projections or embeddings. Next, we used our formerencoder and decoder as "layers" that could be stacked to form the new Transformerencoder and decoder. That made our model much deeper, thus raising the need forwrapping the internal operations (self-, cross-attention, and feed-forwardnetwork, now called "sub-layers") of each "layer" with a combination of layernormalization, dropout, and residual connection. This is what we’ve covered:• using narrow attention in the multi-headed attention mechanism• chunking the projections of the inputs to implement narrow attention• learning that chunking projections allows different heads to focus on, literally,different dimensions of the inputs• standardizing individual data points using layer normalization• using layer normalization to standardize positionally-encoded inputs• changing the dimensionality of the inputs using projections (embeddings)• defining an encoder "layer" that uses two "sub-layers": a self-attentionmechanism and a feed-forward network• stacking encoder "layers" to build a Transformer encoder• wrapping "sub-layer" operations with a combination of layer normalization,dropout, and residual connection• learning the difference between norm-last and norm-first "sub-layers"• understanding that norm-first "sub-layers" allow the inputs to flowunimpeded all the way to the top through the residual connections• defining a decoder "layer" that uses three "sub-layers": a masked selfattentionmechanism, a cross-attention mechanism, and a feed-forwardnetwork• stacking decoder "layers" to build a Transformer decoder• combining both encoder and decoder into a full-blown, norm-first Transformerarchitecture876 | Chapter 10: Transform and Roll Out

Model Configuration & Training

Model Configuration

1 torch.manual_seed(42)

2 # Layers

3 enclayer = EncoderLayer(n_heads=3, d_model=6,

4 ff_units=10, dropout=0.1)

5 declayer = DecoderLayer(n_heads=3, d_model=6,

6 ff_units=10, dropout=0.1)

7 # Encoder and Decoder

8 enctransf = EncoderTransf(enclayer, n_layers=2)

9 dectransf = DecoderTransf(declayer, n_layers=2)

10 # Transformer

11 model_transf = EncoderDecoderTransf(enctransf,

12 dectransf,

13 input_len=2,

14 target_len=2,

15 n_features=2)

16 loss = nn.MSELoss()

17 optimizer = torch.optim.Adam(model_transf.parameters(), lr=0.01)

Weight Initialization

1 for p in model_transf.parameters():

2 if p.dim() > 1:

3 nn.init.xavier_uniform_(p)

Model Training

1 sbs_seq_transf = StepByStep(model_transf, loss, optimizer)

2 sbs_seq_transf.set_loaders(train_loader, test_loader)

3 sbs_seq_transf.train(50)

sbs_seq_transf.losses[-1], sbs_seq_transf.val_losses[-1]

Output

(0.019648547226097435, 0.011462601833045483)

Putting It All Together | 875

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!