pdfcoffee

soumyasankar99
from soumyasankar99 More from this publisher
09.05.2023 Views

Chapter 2In our toy example we use NumPy for generating training data x and labelsy, and we transform it into tf.data.Dataset with tf.data.Dataset.from_tensor_slices(). Then we apply a shuffle to avoid bias in training acrossGPUs and then generate SIZE_BATCHES batches:import tensorflow as tfimport numpy as npfrom tensorflow import kerasN_TRAIN_EXAMPLES = 1024*1024N_FEATURES = 10SIZE_BATCHES = 256# 10 random floats in the half-open interval [0.0, 1.0).x = np.random.random((N_TRAIN_EXAMPLES, N_FEATURES))y = np.random.randint(2, size=(N_TRAIN_EXAMPLES, 1))x = tf.dtypes.cast(x, tf.float32)print (x)dataset = tf.data.Dataset.from_tensor_slices((x, y))dataset = dataset.shuffle(buffer_size=N_TRAIN_EXAMPLES).batch(SIZE_BATCHES)2. In order to distribute some computations to GPUs, we instantiate adistribution = tf.distribute.MirroredStrategy() object, whichsupports synchronous distributed training on multiple GPUs on onemachine. Then, we move the creation and compilation of the Keras modelinside the strategy.scope(). Note that each variable in the model ismirrored across all the replicas. Let's see it in our toy example:# this is the distribution strategydistribution = tf.distribute.MirroredStrategy()# this piece of code is distributed to multiple GPUswith distribution.scope():model = tf.keras.Sequential()model.add(tf.keras.layers.Dense(16, activation='relu', input_shape=(N_FEATURES,)))model.add(tf.keras.layers.Dense(1, activation='sigmoid'))optimizer = tf.keras.optimizers.SGD(0.2)model.compile(loss='binary_crossentropy', optimizer=optimizer)model.summary()# Optimize in the usual way but in reality you are using GPUs.model.fit(dataset, epochs=5, steps_per_epoch=10)[ 77 ]

TensorFlow 1.x and 2.xNote that each batch of the given input is divided equally among the multipleGPUs. For instance, if using MirroredStrategy() with two GPUs, each batch ofsize 256 will be divided among the two GPUs, with each of them receiving 128 inputexamples for each step. In addition, note that each GPU will optimize on thereceived batches and the TensorFlow backend will combine all these independentoptimizations on our behalf. If you want to know more, you can have a look to thenotebook online (https://colab.research.google.com/drive/1mf-PK0a20CkObnT0hCl9VPEje1szhHat#scrollTo=wYar3A0vBVtZ) where I explain how to use GPUsin Colab with a Keras model built for MNIST classification. The notebook is availablein the GitHub repository.In short, using multiple GPUs is very easy and requires minimal changes to thetf.keras code used for a single server.MultiWorkerMirroredStrategyThis strategy implements synchronous distributed training across multiple workers,each one with potentially multiple GPUs. As of September 2019 the strategy worksonly with Estimators and it has experimental support for tf.keras. This strategyshould be used if you are aiming at scaling beyond a single machine with highperformance. Data must be loaded with tf.Dataset and shared across workersso that each worker can read a unique subset.TPUStrategyThis strategy implements synchronous distributed training on TPUs. TPUs areGoogle's specialized ASICs chips designed to significantly accelerate machinelearning workloads in a way often more efficient than GPUs. We will talk more aboutTPUs during Chapter 16, Tensor Processing Unit. According to this public information(https://github.com/tensorflow/tensorflow/issues/24412):"the gist is that we intend to announce support for TPUStrategy alongsideTensorflow 2.1. Tensorflow 2.0 will work under limited use-cases but has manyimprovements (bug fixes, performance improvements) that we're including inTensorflow 2.1, so we don't consider it ready yet."ParameterServerStrategyThis strategy implements either multi-GPU synchronous local training orasynchronous multi-machine training. For local training on one machine, thevariables of the models are placed on the CPU and operations are replicated acrossall local GPUs.[ 78 ]

Chapter 2

In our toy example we use NumPy for generating training data x and labels

y, and we transform it into tf.data.Dataset with tf.data.Dataset.from_

tensor_slices(). Then we apply a shuffle to avoid bias in training across

GPUs and then generate SIZE_BATCHES batches:

import tensorflow as tf

import numpy as np

from tensorflow import keras

N_TRAIN_EXAMPLES = 1024*1024

N_FEATURES = 10

SIZE_BATCHES = 256

# 10 random floats in the half-open interval [0.0, 1.0).

x = np.random.random((N_TRAIN_EXAMPLES, N_FEATURES))

y = np.random.randint(2, size=(N_TRAIN_EXAMPLES, 1))

x = tf.dtypes.cast(x, tf.float32)

print (x)

dataset = tf.data.Dataset.from_tensor_slices((x, y))

dataset = dataset.shuffle(buffer_size=N_TRAIN_EXAMPLES).

batch(SIZE_BATCHES)

2. In order to distribute some computations to GPUs, we instantiate a

distribution = tf.distribute.MirroredStrategy() object, which

supports synchronous distributed training on multiple GPUs on one

machine. Then, we move the creation and compilation of the Keras model

inside the strategy.scope(). Note that each variable in the model is

mirrored across all the replicas. Let's see it in our toy example:

# this is the distribution strategy

distribution = tf.distribute.MirroredStrategy()

# this piece of code is distributed to multiple GPUs

with distribution.scope():

model = tf.keras.Sequential()

model.add(tf.keras.layers.Dense(16, activation='relu', input_

shape=(N_FEATURES,)))

model.add(tf.keras.layers.Dense(1, activation='sigmoid'))

optimizer = tf.keras.optimizers.SGD(0.2)

model.compile(loss='binary_crossentropy', optimizer=optimizer)

model.summary()

# Optimize in the usual way but in reality you are using GPUs.

model.fit(dataset, epochs=5, steps_per_epoch=10)

[ 77 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!