09.05.2023 Views

pdfcoffee

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Advanced Convolutional Neural Networks

The final accuracy is 88.21%, showing that it is possible to successfully use CNNs for

textual processing:

Figure 29: Use CNN for text processing

Note that many other non-image applications can also be converted to an image

and classified using a CNN (see, for instance, https://becominghuman.ai/soundclassification-using-images-68d4770df426).

Audio and music

We have used CNNs for images, videos, and text. Now let's have a look to how

variants of CNNs can be used for audio.

So, you might wonder why learning to synthesize audio is so difficult. Well, each

digital sound we hear is based on 16,000 samples per second (sometimes 48,000 or

more) and building a predictive model where we learn to reproduce a sample based

on all the previous ones is a very difficult challenge.

Dilated ConvNets, WaveNet, and NSynth

WaveNet is a deep generative model for producing raw audio waveforms. This

breakthrough technology has been introduced (WaveNet is available at https://

deepmind.com/blog/wavenet-generative-model-raw-audio/) by Google

DeepMind for teaching computers how to speak. The results are truly impressive

and online you find can examples of synthetic voices where the computer learns

how to talk with the voice of celebrities such as Matt Damon. There are experiments

showing that WaveNet improved the current state-of-the-art Text-to-Speech (TTS)

systems, reducing the difference with respect to human voices by 50% for both US

English and Mandarin Chinese. The metric used for comparison is called Mean

Opinion Score (MOS), a subjective paired comparison test. In the MOS tests, after

listening to each sound stimulus, the subjects were asked to rate the naturalness of

the stimulus on a five-point scale from "Bad" (1) to "Excellent" (5).

[ 178 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!