09.05.2023 Views

pdfcoffee

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Advanced Convolutional Neural Networks

If you want to start playing with VQA, the first thing is to get appropriate training

datasets such as the VQA dataset, CLEVR dataset (https://cs.stanford.edu/

people/jcjohns/clevr/), or FigureQA dataset (https://datasets.maluuba.

com/FigureQA); alternatively, you can participate in a Kaggle VQA challenge

(https://www.kaggle.com/c/visual-question-answering). Then you can

build a model that is the combination of a CNN and a RNN (discussed in Chapter 9,

Autoencoders) and start experimenting. For instance, a CNN can be something like

this code fragment, which takes an image with three channels (224×224) as input

and produces a feature vector for the image:

import tensorflow as tf

from tensorflow.keras import layers, models

# IMAGE

#

# Define CNN for visual processing

cnn_model = models.Sequential()

cnn_model.add(layers.Conv2D(64, (3, 3), activation='relu',

padding='same', input_shape=(224, 224, 3)))

cnn_model.add(layers.Conv2D(64, (3, 3), activation='relu'))

cnn_model.add(layers.MaxPooling2D(2, 2))

cnn_model.add(layers.Conv2D(128, (3, 3), activation='relu',

padding='same'))

cnn_model.add(layers.Conv2D(128, (3, 3), activation='relu'))

cnn_model.add(layers.MaxPooling2D(2, 2))

cnn_model.add(layers.Conv2D(256, (3, 3), activation='relu',

padding='same'))

cnn_model.add(layers.Conv2D(256, (3, 3), activation='relu'))

cnn_model.add(layers.Conv2D(256, (3, 3), activation='relu'))

cnn_model.add(layers.MaxPooling2D(2, 2))

cnn_model.add(layers.Flatten())

cnn_model.summary()

# define the visual_model with proper input

image_input = layers.Input(shape=(224, 224, 3))

visual_model = cnn_model(image_input)

Text can be encoded with an RNN – for now think of it as a black box taking a text

fragment (the question) as input and producing a feature vector for the text:

# TEXT

#

# define the RNN model for text processing

question_input = layers.Input(shape=(100,), dtype='int32')

emdedding = layers.Embedding(input_dim=10000, output_dim=256, input_

length=100)(question_input)

encoded_question = layers.LSTM(256)(emdedding)

[ 164 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!