09.05.2023 Views

pdfcoffee

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Advanced Convolutional Neural Networks

This model has been deployed in production at Google, and is currently being used

to serve Google Assistant queries in real time to millions of users. At the annual

I/O developer conference in May 2018, it was announced that new Google Assistant

voices were available thanks to WaveNet.

Two implementations of WaveNet models for TensorFlow are currently available.

One is the original implementation of DeepMind's WaveNet (original WaveNet

version is available at https://github.com/ibab/tensorflow-wavenet), and the

other is called Magenta NSynth. NSynth (Magenta is available at https://magenta.

tensorflow.org/nsynth) is an evolution of WaveNet recently released by the

Google Brain group, which, instead of being causal, aims at seeing the entire context

of the input chunk. The neural network is truly complex, as depicted in following

diagram, but for the sake of this introductory discussion it is sufficient to know

that the network learns how to reproduce its input by using an approach based

on reducing the error during the encoding/decoding phases:

If you are interested in understanding more, I would suggest having a look at the

online Colab notebook where you can play with models generated with NSynth

(NSynth Colab is available at https://colab.research.google.com/notebooks/

magenta/nsynth/nsynth.ipynb).

MuseNet is a very recent and impressive audio generation tool developed by

OpenAI. MuseNet uses a sparse transformer to train a 72-layer network with 24

attention heads. Transformers will be discussed in Chapter 9, Autoencoders, but for

now all we need to know is that they are deep neural networks that are very good at

predicting what comes next in a sequence—whether text, images, or sound.

In Transformers, every output element is connected to every input element, and the

weightings between them are dynamically calculated according to a process called

attention.

[ 182 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!