pdfcoffee

09.05.2023 Views
Chapter 8Since the entire sequence is consumed in parallel on the Encoder, information aboutthe positions of individual elements are lost. To compensate for this, the inputembeddings are augmented with a positional embedding, which is implementedas a sinusoidal function without learned parameters. The positional embedding isadded to the input embedding.The output of the Encoder is a pair of attention vectors K and V. This is sent inparallel to all the transformer blocks in the decoder. The transformer block on thedecoder is similar to that on the encoder, except that it has an additional multi-headattention layer to attend to the attention vectors from the encoder. This additionalmulti-head attention layer works similar to the one in the encoder and the onebelow it, except it combines the Q vector from the layer below it and the K and Qvectors from the encoder state.Similar to the seq2seq network, the output sequence is generated one token at atime, using the input from the previous time step. As with the input to the encoder,the input to the decoder is also augmented with a positional embedding. Unlike theencoder, the self attention process in the decoder is only allowed to attend to tokensat previous time points. This is done by masking out tokens at future time points.The output of the last transformer block in the Decoder is a sequence of lowdimensionalembeddings (512 for reference implementation [30] as noted earlier).This is passed to the Dense layer, which converts it into a sequence of probabilitydistributions across the target vocabulary, from which we generate the mostprobable word either greedily or by a more sophisticated technique such as beamsearch.This has been a fairly high-level coverage of the transformer architecture. It hasachieved state of the art results in some machine translation benchmarks. The BERTembedding, which we have talked about in the previous chapter, is the encoderportion of a transformer network trained on sentence pairs in the same language.The BERT network comes in two flavors, both of which are somewhat larger than thereference implementation – BERT-base has 12 encoder layers, a hidden dimension of768, and 8 attention heads on its Multi-head attention layers, while BERT-large has24 encoder layers, hidden dimension of 1024, and 16 attention heads.If you would like to learn more about transformers, the illustrated transformer blogpost by Allamar [33] provides a very detailed, and very visual, guide to the structureand inner workings of this network. In addition, for those of you who prefer code,the textbook by Zhang, et al. [31], describes and builds up a working model of thetransformer network using MXNet.[ 339 ]

Recurrent Neural NetworksSummaryIn this chapter, we learned about RNNs, a class of networks that are specializedfor dealing with sequences such as natural language, time series, speech, and soon. Just like CNNs exploit the geometry of images, RNNs exploit the sequentialstructure of their inputs. We learned about the basic RNN cell and how it handlesstate from previous time steps, and how it suffers from vanishing and explodinggradients because of inherent problems with BPTT. We saw how these problemsled to the development of novel RNN cell architectures such as LSTM, GRU, andpeephole LSTMs. We also learned about some simple ways to make your RNNmore effective, such as making it Bidirectional or Stateful.We then looked at different RNN topologies, and how each topology is adapted toa particular set of problems. After a lot of theory, we finally saw examples of threeof these topologies. We then focused on one of these topologies, called seq2seq,which first gained popularity in the machine translation community, but has sincebeen used in situations where the use case can be adapted to look like a machinetranslation problem.From here, we looked at attention, which started off as a way to improve theperformance of seq2seq networks, but has since been used very effectively in manysituations where we want to compress the representation while keeping the dataloss to a minimum. We looked at different kinds of attention, and looked at anexample of using them in a seq2seq network with attention.Finally, we looked at the transformer network, which is basically an Encoder-Decoder architecture where the recurrent layers have been replaced with Attentionlayers. At the time of writing this, transformer networks are considered state ofthe art, and they are being increasingly used in many situations.In the next chapter, you will learn about Autoencoders, another type of Encoder-Decoder architecture that have proven to be useful in semi-supervised orunsupervised settings.References1. Jozefowicz, R., Zaremba, R. and Sutskever, I. (2015). An Empirical Explorationof Recurrent Neural Network Architectures. Journal of Machine Learning.2. Greff, K., et al. (July 2016). LSTM: A Search Space Odyssey. IEEE Transactionson Neural Networks and Learning Systems.3. Bernal, A., Fok, S., and Pidaparthi, R. (December 2012). Financial Markets TimeSeries Prediction with Recurrent Neural Networks.[ 340 ]

Page 2 and 3: Deep Learning withTensorFlow 2 and

Page 4 and 5: packt.comSubscribe to our online di

Page 6 and 7: I want to thank my kids, Aurora, Le

Page 8 and 9: Sujit Pal is a Technology Research

Page 10 and 11: Table of ContentsPrefacexiChapter 1

Page 12 and 13: [ iii ]Table of ContentsConverting

Page 14 and 15: Table of ContentsSo what is the pro

Page 16 and 17: [ vii ]Table of ContentsChapter 10:

Page 18 and 19: Table of ContentsPretrained models

Page 20 and 21: PrefaceDeep Learning with TensorFlo

Page 22 and 23: • Supervised learning, in which t

Page 24 and 25: PrefaceThe complexity of deep learn

Page 26 and 27: PrefaceFigure 5: Adoption of deep l

Page 28 and 29: Chapter 1, Neural Network Foundatio

Page 30 and 31: PrefaceChapter 13, TensorFlow for M

Page 32 and 33: ConventionsThere are a number of te

Page 34: PrefaceReferences1. Deep Learning w

Page 37 and 38: Neural Network Foundations with Ten
























Page 86 and 87: TensorFlow 1.x and 2.xThe intent of

Page 88 and 89: An example to start withWe'll consi

Page 90 and 91: Chapter 23. Placeholders: Placehold

Page 92 and 93: • To create random values from a

Page 94 and 95: To know the value, we need to creat

Page 96 and 97: Chapter 2Both PyTorch and TensorFlo

Page 98 and 99: Chapter 2state = [tf.zeros([100, 10

Page 100 and 101: Chapter 2For now, there's no need t

Page 102 and 103: Chapter 2Let's see an example of a

Page 104 and 105: Chapter 2If you want to save a mode

Page 106 and 107: Chapter 2supervised=True)train_data

Page 108 and 109: Chapter 2There, tf.feature_column.n

Page 110 and 111: Chapter 2print (dz_dx)print (dy_dx)

Page 112 and 113: Chapter 2In our toy example we use

Page 114 and 115: Chapter 2For multi-machine training

Page 116 and 117: Chapter 25. Use tf.layers modules t

Page 118 and 119: Chapter 2Keras or tf.keras?Another

Page 120: • tf.data can be used to load mod

Page 123 and 124: RegressionLet us imagine a simpler

Page 125 and 126: RegressionTake a look at the last t

Page 127 and 128: Regression3. Now, we calculate the

Page 129 and 130: RegressionIn the next section we wi

Page 131 and 132: Regression2. Now, we define the fea

Page 133 and 134: Regression2. Download the dataset:(

Page 135 and 136: RegressionThe following is the Tens

Page 137 and 138: RegressionIn regression the aim is

Page 139 and 140: RegressionThe Estimator outputs the

Page 141 and 142: RegressionThe following is the grap

Page 143 and 144: RegressionReferencesHere are some g

Page 145 and 146: Convolutional Neural NetworksIn thi

Page 147 and 148: Convolutional Neural NetworksIn thi

Page 149 and 150: Convolutional Neural NetworksIn oth

Page 151 and 152: Convolutional Neural NetworksThen w

Page 153 and 154: Convolutional Neural NetworksHoweve

Page 155 and 156: Convolutional Neural NetworksPlotti

Page 157 and 158: Convolutional Neural NetworksIn gen

Page 159 and 160: Convolutional Neural NetworksOur ne

Page 161 and 162: Convolutional Neural NetworksThese

Page 163 and 164: Convolutional Neural NetworksSo, we

Page 165 and 166: Convolutional Neural NetworksEach i

Page 167 and 168: Convolutional Neural NetworksVery d

Page 169 and 170: Convolutional Neural NetworksRecogn

Page 171 and 172: Convolutional Neural NetworksIf we

Page 173 and 174: Convolutional Neural NetworksRefere

Page 175 and 176: Advanced Convolutional Neural Netwo

























Page 226 and 227: GenerativeAdversarial NetworksIn th

Page 228 and 229: [ 193 ]Chapter 6Eventually, we reac

Page 230 and 231: [ 195 ]Chapter 6Next, we combine th

Page 232 and 233: Chapter 6And handwritten digits gen

Page 234 and 235: Chapter 6Figure 1: Visualizing the

Page 236 and 237: Chapter 6The resultant generator mo

Page 238 and 239: Chapter 6Figure 4: A summary of res

Page 240 and 241: Chapter 6def train(self, epochs, ba

Page 242 and 243: Chapter 6The preceding images were

Page 244 and 245: Chapter 6Another interesting paper

Page 246 and 247: Chapter 6To elaborate, let us say t

Page 248 and 249: Chapter 6Figure 7: The architecture

Page 250 and 251: Chapter 6Figure 11: Illegible initi

Page 252 and 253: Chapter 6Bedrooms: Generated bedroo

Page 254 and 255: Chapter 6The images need to be norm

Page 256 and 257: Chapter 6initializer = tf.random_no

Page 258 and 259: Cool, right? Now we can define the

Page 260 and 261: Chapter 6d_loss = (dA_loss + dB_los

Page 262 and 263: Chapter 6generator_AB.save_weights(

Page 264: 6. Ledig, Christian, et al. Photo-R

Page 267 and 268: Word EmbeddingsDeep learning models

Page 269 and 270: Word EmbeddingsFor example, "crucia

Page 271 and 272: Word EmbeddingsAssuming a window si

Page 273 and 274: Word EmbeddingsGloVeThe Global vect

Page 275 and 276: Word Embeddingsgensim is an open so

Page 277 and 278: Word Embeddingsgensim also provides

Page 279 and 280: Word EmbeddingsSpecifically, we wil

Page 281 and 282: Word EmbeddingsWe will also convert

Page 283 and 284: Word EmbeddingsE = np.zeros((vocab_

Page 285 and 286: Word Embeddingsx = self.embedding(x

Page 287 and 288: Word EmbeddingsThe change in valida

Page 289 and 290: Word EmbeddingsThe dataset is a 114

Page 291 and 292: Word Embeddingsprint("random walks

Page 293 and 294: Word Embeddingssize=128, # size of

Page 295 and 296: Word EmbeddingsfastText computes em

Page 297 and 298: Word EmbeddingsIn the future, once

Page 299 and 300: Word EmbeddingsA much earlier relat

Page 301 and 302: Word EmbeddingsOnce you have the fi

Page 303 and 304: Word EmbeddingsThis will create the

Page 305 and 306: Word EmbeddingsClassifying with BER

Page 307 and 308: Word Embeddings2. Each Transformer

Page 309 and 310: Word EmbeddingsOnce trained, we sav

Page 311 and 312: Word Embeddings4. Pennington, J., S

Page 313 and 314: Word Embeddings34. Google Research,

Page 315 and 316: Recurrent Neural NetworksWe will th

Page 317 and 318: Recurrent Neural NetworksFor notati

Page 319 and 320: Recurrent Neural NetworksThis probl

Page 321 and 322: Recurrent Neural NetworksThe line a

Page 323 and 324: Recurrent Neural NetworksGated recu

Page 325 and 326: Recurrent Neural NetworksThis probl

Page 327 and 328: Recurrent Neural NetworksThe topolo

Page 329 and 330: Recurrent Neural Networkstexts = do

Page 331 and 332: Recurrent Neural Networksdef call(s

Page 333 and 334: Recurrent Neural Networks# callback

Page 335 and 336: Recurrent Neural NetworksExample

Page 337 and 338: Recurrent Neural NetworksAs can be

Page 339 and 340: Recurrent Neural Networksdata_dir =

Page 341 and 342: Recurrent Neural NetworksWe can als

Page 343 and 344: Recurrent Neural NetworksIn order t

Page 345 and 346: Recurrent Neural Networkssource_voc

Page 347 and 348: Recurrent Neural NetworksFinally, w

Page 349 and 350: Recurrent Neural Networks38 - val_l

Page 351 and 352: Recurrent Neural NetworksIf you wou

Page 353 and 354: Recurrent Neural NetworksExample

Page 355 and 356: Recurrent Neural NetworksNext we ha

Page 357 and 358: Recurrent Neural Networksself.embed

Page 359 and 360: Recurrent Neural NetworksThis is a

Page 361 and 362: Recurrent Neural Networksreturn np.

Page 363 and 364: Recurrent Neural NetworksAttention

Page 365 and 366: Recurrent Neural NetworksFinally, V

Page 367 and 368: Recurrent Neural Networks# query.sh

Page 369 and 370: Recurrent Neural Networksself.atten

Page 371 and 372: Recurrent Neural Networks30 try to

Page 373: Recurrent Neural Networks3. Because

Page 377 and 378: Recurrent Neural Networks18. Shi, X

Page 380 and 381: AutoencodersAutoencoders are feed-f

Page 382 and 383: Depending upon the actual dimension

Page 384 and 385: • __init__(): Here, you define al

Page 386 and 387: Chapter 9And then we reshape the te

Page 388 and 389: Chapter 9plt.imshow(x_test[index].r

Page 390 and 391: Chapter 9Keeping the rest of the co

Page 392 and 393: noise = np.random.normal(loc=0.5, s

Page 394 and 395: Chapter 9x_train,validation_data=(x

Page 396 and 397: Chapter 9import matplotlib.pyplot a

Page 398 and 399: Chapter 9self.conv4 = Conv2D(1, 3,

Page 400 and 401: Chapter 9You can see that the image

Page 402 and 403: [ 367 ]Chapter 9Let us use the prec

Page 404 and 405: Chapter 9Our autoencoder model take

Page 406 and 407: We train the autoencoder for 20 epo

Page 408 and 409: Chapter 90.97905576229095460.989323

Page 410 and 411: Unsupervised LearningThis chapter d

Page 412 and 413: Chapter 10Next we load the MNIST da

Page 414 and 415: Chapter 10TensorFlow Embedding APIT

Page 416 and 417: 3. Recompute the centroids using cu

Page 418 and 419: Chapter 10Figure 4: Plot of the fin

Page 420 and 421: Chapter 10In SOMs, neurons are usua

Page 422 and 423: [ 387 ]Chapter 10Colour mapping usi

Page 424 and 425: Chapter 10# Calculating Neighbourho

Page 426 and 427: We will also need to normalize the

Page 428 and 429: Chapter 10ρρ(vv oo |h oo ) = σσ

Page 430 and 431: # Generate the sample probabilityde

Page 432 and 433: Chapter 10And the reconstructed ima

Page 434 and 435: Chapter 10inpX = rbm.rbm_output(inp

Page 436 and 437: Chapter 10(60000, 28, 28) (60000,)(

Page 438 and 439: Chapter 10Figure 11: Summary of the

Page 440 and 441: Chapter 10This chapter, along with

Page 442 and 443: Reinforcement LearningThis chapter

Page 444 and 445: Chapter 11And unlike unsupervised l

Page 446 and 447: Chapter 11Normally, the value is de

Page 448 and 449: Chapter 11• The next question tha

Page 450 and 451: Chapter 11This neural network takes

Page 452 and 453: Chapter 11The MuJoCo environment re

Page 454 and 455: Chapter 11We will first import the

Page 456 and 457: Chapter 11The αα is the learning

Page 458 and 459: Chapter 11We set up the global valu

Page 460 and 461: Chapter 11else:return np.argmax(sel

Page 462 and 463: Chapter 11DQN to play a game of Ata

Page 464 and 465: Chapter 11self.model.add( Conv2D(64

Page 466 and 467: Chapter 11Here the action A was sel

Page 468 and 469: Chapter 11Image source: https://arx

Page 470 and 471: Chapter 11A neural network is used

Page 472: Chapter 1111. Details regarding ins

Page 475 and 476: TensorFlow and Cloud• Scalability

Page 477 and 478: TensorFlow and Cloud• Azure DevOp

Page 479 and 480: TensorFlow and Cloud• Lambda: The

Page 481 and 482: TensorFlow and Cloud• Deep Learni

Page 483 and 484: TensorFlow and CloudEC2 on AmazonTo

Page 485 and 486: TensorFlow and CloudCompute Instanc

Page 487 and 488: TensorFlow and CloudYou just share

Page 489 and 490: TensorFlow and CloudIn case you req

Page 491 and 492: TensorFlow and CloudIt starts with

Page 493 and 494: TensorFlow and CloudTFX librariesTF

Page 495 and 496: TensorFlow and CloudReferences1. To

Page 497 and 498: TensorFlow for Mobile and IoT and T















Page 527 and 528: An introduction to AutoMLThat is pr

Page 529 and 530: An introduction to AutoMLFeature co

Page 531 and 532: An introduction to AutoMLThis Effic

Page 533 and 534: An introduction to AutoMLGoogle Clo

Page 535 and 536: An introduction to AutoMLThen, we c

Page 537 and 538: An introduction to AutoMLOnce the d

Page 539 and 540: An introduction to AutoMLIf your mo

Page 541 and 542: An introduction to AutoMLClicking o

Page 543 and 544: An introduction to AutoMLFigure 16:

Page 545 and 546: An introduction to AutoMLYou can al

Page 547 and 548: An introduction to AutoMLPut simply

Page 549 and 550: An introduction to AutoMLLet's star

Page 551 and 552: An introduction to AutoMLThe token

Page 553 and 554: An introduction to AutoMLThis will


Page 557 and 558: An introduction to AutoMLAt the end

Page 559 and 560: An introduction to AutoMLUsing Clou

Page 561 and 562: An introduction to AutoMLOnce the d

Page 563 and 564: An introduction to AutoMLAt the end

Page 565 and 566: An introduction to AutoMLAs the nex

Page 567 and 568: An introduction to AutoMLOnce the m


Page 571 and 572: An introduction to AutoMLOnce the m

Page 573 and 574: An introduction to AutoMLWe can als

Page 575 and 576: An introduction to AutoMLThe most e

Page 577 and 578: An introduction to AutoMLReferences

Page 579 and 580: The Math Behind Deep LearningSome m

Page 581 and 582: The Math Behind Deep LearningSuppos

Page 583 and 584: The Math Behind Deep LearningNote t

Page 585 and 586: The Math Behind Deep LearningTheref

Page 587 and 588: The Math Behind Deep LearningThe ea

Page 589 and 590: The Math Behind Deep LearningThe re

Page 591 and 592: The Math Behind Deep LearningCase 2

Page 593 and 594: The Math Behind Deep LearningIn thi

Page 595 and 596: The Math Behind Deep LearningHere,

Page 597 and 598: The Math Behind Deep Learning(Note

Page 599 and 600: The Math Behind Deep LearningIn man

Page 601 and 602: The Math Behind Deep LearningIf we

Page 603 and 604: The Math Behind Deep LearningChapte

Page 605 and 606: The Math Behind Deep LearningThis c

Page 607 and 608: Tensor Processing UnitMany people b

Page 609 and 610: Tensor Processing UnitThe sequentia

Page 611 and 612: Tensor Processing UnitIf you want t

Page 613 and 614: Tensor Processing UnitOn the other

Page 615 and 616: Tensor Processing UnitHow to use TP

Page 617 and 618: Tensor Processing UnitNote that ful

Page 619 and 620: Tensor Processing UnitEpoch 10/1060

Page 621 and 622: Tensor Processing UnitFigure 11: Go

Page 623 and 624: Tensor Processing UnitThen the usag

Page 626 and 627: Other Books YouMay EnjoyIf you enjo

Page 628 and 629: Other Books You May EnjoyAI Crash C

Page 630: Other Books You May EnjoyLeave a re

Page 633 and 634: AutoML pipelinedata preparation 493

Page 635 and 636: Deep Deterministic Policy Gradient(

Page 637 and 638: Google cloud consolereference link

Page 639 and 640: used, for building GAN 193-198MNIST

Page 641 and 642: regularizersreference link 38reinfo

Page 643 and 644: TensorFlow Lite 81TensorFlow Core r

Page 645: Xxception networks 160, 162YYOLO ne

tensorflow

input

neural

networks

layer

import

dataset

output

models

images

pdfcoffee

pdfcoffee

Delete template?

Save as template ?