09.05.2023 Views

pdfcoffee

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 10

Next we load the MNIST dataset. Since we are doing dimension reduction using

PCA, we do not need a test dataset or even labels; however, we are loading labels so

that after reduction we can verify the PCA performance. PCA should cluster similar

datapoints in one cluster, hence if we see the clusters formed using PCA are similar

to our labels it would indicate that our PCA works:

((x_train, y_train), (_, _)) = tf.keras.datasets.mnist.load_data()

Before we do PCA we should preprocess the data. We first normalize it so that all

data has values between 0 and 1, and then reshape the image from being 28 × 28

matrix to a 784-dimensional vector, and finally center it by subtracting the mean:

x_train = x_train / 255.

x_train = x_train.astype(np.float32)

x_train = np.reshape(x_train, (x_train.shape[0], 784))

mean = x_train.mean(axis = 1)

x_train = x_train - mean[:,None]

Now that our data is in the right format, we make use of TensorFlow's powerful

linear algebra (linalg) module to calculate the SVD of our training dataset.

TensorFlow provides the function svd() defined in tf.linalg to perform this

task. And then use the diag function to convert the sigma array (s, a list of singular

values) to a diagonal matrix:

s, u, v = tf.linalg.svd(x_train)

s = tf.linalg.diag(s)

This provides us with a diagonal matrix s of size 784 × 784; a left singular matrix u of

size 60000 × 784; and a right singular matrix of size 784 × 784. This is so because the

argument "full_matrices" of the function svd() is by default set to False. As a result

it does not generate the full U matrix (in this case, of size 60000 × 60000), instead, if

input X is of size m × n it generates U of size p = min(m,n).

The reduced-dimension data can now be generated by multiplying respective slices

of u and s. We reduce our data from 784 to 3 dimensions, we can choose to reduce to

any dimension less than 784, but we chose 3 here so that it is easier for us to visualize

later. We make use of tf.Tensor.getitem to slice our matrices in the Pythonic way:

k = 3

pca = tf.matmul(u[:,0:k], s[0:k,0:k])

[ 377 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!