09.05.2023 Views

pdfcoffee

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 16

It was clear that neither CPUs nor GPUs were a suitable solution. So, Google

decided that they needed something completely new; something that would allow

a 10x growth in performance with no significant cost increase. That's how TPU

v1 was born! What is impressive is that it took only 15 months from initial design

to production. You can find more details about this story in Jouppi et al., 2014 [3]

where a detailed report about different inference workloads seen at Google in 2013

is also reported:

Figure 1: Different inference workloads seen at Google in 2013 (source [3])

Let's talk a bit about the technical details. TPU v1 is a special device (or an

Application-Specific Integrated Circuit, also known as ASIC) designed for superefficient

tensor operations. TPUs follow the philosophy less is more. This philosophy

has an important consequence: TPUs do not have all the graphic components that

are needed for GPUs. Because of this, they are both very efficient from an energy

consumption perspective, and frequently much faster than GPUs. So far, there

have been three generations of TPUs. Let's review them.

Three generations of TPUs and Edge TPU

As discussed, TPUs are domain-specific processors expressly optimized for

matrix operations. Now, you might remember that the basic operation of a matrix

multiplication is a dot product between a line from one matrix and a column from the

other matrix. For instance, given a matrix multiplication Y=X*W, computing Y[i,0] is:

YY[ii, 0] = XX[ii, 0] ∗ WW[0,0] + XX[ii, 1] ∗ WW[1,0] + XX[ii, 2] ∗ WW[2,0] + ⋯ + XX[ii, nn] ∗ WW[nn, 0]

[ 573 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!