09.05.2023 Views

pdfcoffee

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

TensorFlow 1.x and 2.x

Distributed training in TensorFlow 2.x

One very useful addition to TensorFlow 2.x is the possibility to train models using

distributed GPUs, multiple machines, and TPUs in a very simple way with very

few additional lines of code. tf.distribute.Strategy is the TensorFlow API

used in this case and it supports both tf.keras and tf.estimator APIs and eager

execution. You can switch between GPUs, TPUs, and multiple machines by just

changing the strategy instance. Strategies can be synchronous, where all workers

train over different slices of input data in a form of sync data parallel computation,

or asynchronous, where updates from the optimizers are not happening in sync.

All strategies require that data is loaded in batches the API tf.data.Dataset API.

Note that the distributed training support is still experimental. A roadmap is given

in Figure 4:

Figure 4: Distributed training support for different strategies and APIs

Let's discuss in detail all the different strategies reported in Figure 4.

Multiple GPUs

We discussed how TensorFlow 2.x can utilize multiple GPUs. If we want to have

synchronous distributed training on multiple GPUs on one machine, there are

two things that we need to do: (1) We need to load the data in a way that will be

distributed into the GPUs, and (2) We need to distribute some computations into the

GPUs too:

1. In order to load our data in a way that can be distributed into the GPUs, we

simply need a tf.data.Dataset (which has already been discussed in the

previous paragraphs). If we do not have a tf.data.Dataset but we have

a normal tensor, then we can easily convert the latter into the former using

tf.data.Dataset.from_tensors_slices(). This will take a tensor in

memory and return a source dataset, the elements of which are slices of the

given tensor.

[ 76 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!