09.05.2023 Views

pdfcoffee

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 14

On the CIFAR-10 dataset, this method, starting from scratch, designed a novel

network architecture that rivals the best human-invented architecture in terms of

test set accuracy. The CIFAR-10 model achieves a test error rate of 3.65, which is

0.09 percent better and 1.05x faster than the previous state-of-the-art model that

used a similar architectural scheme. On the Penn Treebank dataset, the model can

compose a novel recurrent cell that outperforms the widely used an LSTM cell (see

Chapter 9, Autoencoders), and other state-of-the-art baselines. The cell achieves a test

set perplexity of 62.4 on the Penn Treebank, which is 3.6 better than the previous

state-of-the-art model.

The key outcome of the paper is shown in Figure 2. A controller network based

on RNNs produces a sample architecture A with probability p. This candidate

architecture A is trained by a child network to get a candidate accuracy R.

Then a gradient of p is computed and scaled by R to update the controller. This

reinforcement learning operation is computed in a cycle a number of times. The

process of generating an architecture stops if the number of layers exceeds a

certain value. The details of how a RL-based policy gradient method is used by the

controller RNN to generate better architectures are in [1]. Here we emphasize the

fact that NAS uses a meta-modeling algorithm based on Q-learning with εε − gggggggggggg

exploration strategy and with experience replay (see Chapter 11, Reinforcement

Learning) to explore the model search space:

Figure 2: NAS with Recurrent Neural Networks

Since the original paper in late 2016, a Cambrian explosion of model generation

techniques has been observed. Initially, the goal was to generate the entire model in

one single step. Later, a cell-based approach has been proposed where the generation

is divided into two macro-steps: first a cell structure is automatically built and then a

predefined number of discovered cells are stacked together to generate an entire endto-end

architecture [2].

[ 495 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!