09.05.2023 Views

pdfcoffee

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 16

TPU2 has MMU for matrix multiplications of 128x128 cores and a Vector

Processing Unit (VPU) for all other tasks such as applying activations. The VPU

handles float32 and int32 computations. The MXU on the other hand operates

in a mixed precision 16-32 bit floating point format.

Each TPU v2 chip has two cores, and up to 4 chips are mounted in each board. In

TPU v2, Google adopted a new floating-point model called bfloat 16. The idea is

to sacrifice some resolution but still be very good for deep learning. This reduction

in resolution allows you to improve the performance of the v2 TPUs, which are

more power efficient than v1. Indeed, it can be proven that a smaller mantissa helps

reducing the physical silicon area and multiplier power. Therefore, the bfloat16 uses

the same standard IEEE 754 single-precision floating-point format, but it truncates

the mantissa field from 23 bits to just 7 bits. Preserving the exponent bits allows

the format to keep the same range as the 32-bit single precision. This allows for

relatively simpler conversion between the two data types:

Figure 5: Cloud TPU v3 and Cloud TPU v2

Google offers access to these TPU v2 and TPU v3 via Google Compute Engine

(GCE) and on Google Kubernetes Engine (GKE). Plus, it is possible to use them for

free via Colab.

Third-generation TPU

The third-generation TPUs (TPU3) were announced in 2018 [4]. TPU3s are 2x faster

than TPU2 and they are grouped in 4x larger pods. In total, this is a performance

increase of 8x. Cloud TPU v3 pods can deliver more than 100 PetaFLOPS of

computing power.

[ 577 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!