09.05.2023 Views

pdfcoffee

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

The Math Behind

Deep Learning

In this chapter we discuss the math behind deep learning. This topic is quite

advanced and not necessarily required for practitioners. However, it is

recommended reading if you are interested in understanding what is going on

under the hood when you play with neural networks. We start with an historical

introduction, and then we will review the high school concept of derivatives

and gradients. We will also introduce the gradient descent and backpropagation

algorithms commonly used to optimize deep learning networks.

History

The basics of continuous backpropagation were proposed by Henry J. Kelley [1] in

1960 using dynamic programming. Stuart Dreyfus proposed using the chain rule in

1962 [2]. Paul Werbos was the first proposing to use backpropagation for neural nets

in his 1974 PhD Thesis [3]. However, it was only in 1986 that backpropagation gained

success with the work of David E. Rumelhart, Geoffrey E. Hinton, and Ronald J.

Williams published in Nature [4]. Only in 1987, Yan LeCun described the modern

version of backpropagation currently used for training neural networks [5].

The basic intuition of SGD was introduced by Robbins and Monro in 1951 in a

context different from neural networks [6]. Only in 2012 – or 52 years after the first

time backpropagation was first introduced – AlexNet [7] achieved a top-5 error of

15.3% in the ImageNet 2012 Challenge using GPUs. According to The Economist

[8], "Suddenly people started to pay attention, not just within the AI community

but across the technology industry as a whole." Innovation in this field was not

something that happened overnight. Instead it was a long walk lasting more than

50 years!

[ 543 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!