16.03.2021 Views

Advanced Deep Learning with Keras

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 10

Figure 10.7.8: The total rewards received per episode using the A2C method

In the experiments conducted, we used the same learning rate, 1e-3, for log

probability and value networks optimization. The discount factor is set to 0.99,

except for A2C which is easier to train at a 0.95 discount factor.

The reader is encouraged to run the trained network by executing:

$ python3 policygradient-car-10.1.1.py

--encoder_weights=encoder_weights.h5 --actor_weights=actor_weights.h5

Following table shows other modes of running policygradient-car-10.1.1.py.

The weights file (that is, *.h5) can be replaced by your own pre-trained weights file.

Please consult the code to see the other potential options:

Purpose

Train REINFORCE

from scratch

Train REINFORCE

with baseline from

scratch

Train Actor-Critic

from scratch

Train A2C from

scratch

Run

python3 policygradient-car-10.1.1.py

--encoder_weights=encoder_weights.h5

python3 policygradient-car-10.1.1.py

--encoder_weights=encoder_weights.h5 -b

python3 policygradient-car-10.1.1.py

--encoder_weights=encoder_weights.h5 -a

python3 policygradient-car-10.1.1.py

--encoder_weights=encoder_weights.h5 -c

[ 339 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!