Q-Learning

More documents

Recommendations

Info

Q <strong>Learning</strong> Algorithm For each state-action pair (s, a), initialize the table entry Q s , a to zero Observe the current state s Do forever: ---Select an action a and execute it ---Receive immediate reward r ---Observe the new state s' ---Update the table entry for Qs , a as follows: Qs ,a=r max a ' Q s' ,a ' --- s=s '
Example Problem a12 a23 s1 s2 s3 a21 a32 a36 a25 a14 a41 a45 a52 a56 s4 s5 End: s6 a54 γ = .5, r = 100 if moving into state s6, 0 otherwise
Page 1 and 2: Q-Learning ● ● ● Reinforcemen
Page 3 and 4: Our Problem ● ● ● ● ● We
Page 5: Formal Definition Q s , a=r s ,amax
Page 9 and 10: The Algorithm s1, a12 0 s1, a14 0 s
Page 11 and 12: Next Move s1, a12 0 s1, a14 0 s2, a
Page 13 and 14: Next Move s1, a12 0 s1, a14 0 s2, a
Page 15 and 16: New Game s1, a12 0 s1, a14 0 s2, a2
Page 17 and 18: Final State (after many iterations)
Page 19 and 20: Neural Network Approximation ●
Page 21 and 22: Exploration Strategy ● ● ● Wa

Q-Learning

Create successful ePaper yourself

Delete template?

Save as template?