Advanced Deep Learning with Keras

fourpersent2020
from fourpersent2020 More from this publisher
16.03.2021 Views

Chapter 10Performance evaluation of policy gradientmethodsThe four policy gradients methods were evaluated by training the agent for 1,000episodes. We define 1 training session as 1,000 episodes of training. The firstperformance metric is measured by accumulating the number of times the carreached the flag in 1,000 episodes. Figures 10.7.1 to 10.7.4 shows five training sessionsper method.In this metric, A2C reached the flag with the greatest number of times followed byREINFORCE with baseline, Actor-Critic, and REINFORCE. The use of baseline orcritic accelerates the learning. Note that these are training sessions with the agentcontinuously improving its performance. There were cases in the experimentswhere the agent's performance did not improve with time.The second performance metric is based on the requirement that theMountainCarContinuous-v0 is considered solved if the total reward per episodeis at least 90.0. From the five training sessions per method, we selected one trainingsession with the highest total reward for the last 100 episodes (episodes 900 to999). Figures 10.7.5 to 10.7.8 show the results of the four policy gradient methods.REINFORCE with baseline is the only method that was able to consistently achievea total reward of about 90 after 1,000 episodes of training. A2C has the second-bestperformance but could not consistently reach at least 90 for the total rewards.Figure 10.7.1: The number of times the mountain car reached the flag using REINFORCE method[ 335 ]

Policy Gradient MethodsFigure 10.7.2: The number of times the mountain car reached the flag using REINFORCE with baseline methodFigure 10.7.3: The number of times the mountain car reached the flag using the Actor-Critic method[ 336 ]

Policy Gradient Methods

Figure 10.7.2: The number of times the mountain car reached the flag using REINFORCE with baseline method

Figure 10.7.3: The number of times the mountain car reached the flag using the Actor-Critic method

[ 336 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!