09.05.2023 Views

pdfcoffee

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Tensor Processing Unit

On the other hand, Cloud TPU v2 pods released in alpha in 2018 can achieve 11.5

PetaFLOPS; another impressive improvement. As of 2019 both TPU2 and TPU3 are

in production with different prices:

Figure 6: Google announced TPU v2 and v3 Pods in beta at the Google I/O 2019

TPU v3 board has 4 TPU chips, 8 cores, and liquid cooling. Google has adopted

an ultra-high-speed interconnect hardware derived from supercomputer

technology, for connecting thousands of TPUs with very low latency. Each time

a parameter is updated on a single TPU, all the others are informed via a reduceall

algorithm typically adopted for parallel computation. So, you can think about

TPU v3 as one of the fastest supercomputers available today for matrix and tensor

operations with thousands of TPUs inside it.

Edge TPU

In addition to the three generations of TPUs already discussed, in 2018 Google

announced a special generation of TPUs running on the edge. This TPU is

particularly appropriate for Internet of Things (IoT) and for supporting

TensorFlow Lite on mobile and IoT. With this we conclude the introduction to

TPU v1, v2, and v3. In the next section we will briefly discuss performance.

TPU performance

Discussing performance is always difficult because it is important to first define

the metrics that we are going to measure, and the set of workloads that we are

going to use as benchmarks. For instance, Google reported an impressive linear

scaling for TPU v2 used with ResNet-50 [4] (see Figure 7).

[ 578 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!