14.02.2016 Views

Bayesian Opponent Exploitation in Imperfect-Information Games

BayesianExploitation_Slides

BayesianExploitation_Slides

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Bayesian</strong> <strong>Opponent</strong> <strong>Exploitation</strong> <strong>in</strong><br />

<strong>Imperfect</strong>-<strong>Information</strong> <strong>Games</strong><br />

Sam Ganzfried<br />

http://www.ganzfriedresearch.com/


Construct<strong>in</strong>g an opponent model<br />

E.g., if opponent has played Rock 10 times Paper<br />

7 times Scissors 3 times, can predict he will play<br />

R with prob 10/20, etc.<br />

In imperfect-<strong>in</strong>formation games more challeng<strong>in</strong>g<br />

but doable to approximate<br />

– e.g., Ganzfried/Sandholm AAMAS 2011


But is it really valid to assign a s<strong>in</strong>gle “model”?<br />

What if he isn’t follow<strong>in</strong>g that exact strategy?<br />

– Maybe he is play<strong>in</strong>g R with prob 0.49!!


Restricted Nash Response<br />

Johanson, Z<strong>in</strong>kevich, Bowl<strong>in</strong>g NIPS 2007


• Suppose opponent is play<strong>in</strong>g σ -i , where σ -i (s -j ) is<br />

probability that he plays pure strategy s -j <strong>in</strong> S -j<br />

u i (σ i , σ -i ) = ∑ s-j [σ -i (s -j ) * u i (σ i , s -j )]<br />

• Now suppose opponent is play<strong>in</strong>g a probability<br />

distribution f -i over mixed strategies<br />

u i (σ i , f -i ) = ∫ σ-i [f -i (σ -i ) * u i (σ i , σ -i )]<br />

• Let f* -i denote expectation of f -i . Selects s -j with prob<br />

∫ σ-i [σ -i (s -j ) * f -i (σ -i )]


Theorem: u i (σ i , f* -i ) = u i (σ i , f -i )<br />

Proof:<br />

u i (σ i , f* -i ) = ∑ s-j [u i (σ i , s -j ) ∫ σ-i [σ -i (s -j ) * f -i (σ -i )]]<br />

= ∑ s-j [∫ σ-i [u i (σ i , s -j ) * σ -i (s -j ) * f -i (σ -i )]]<br />

= ∫ σ-i [∑ s-j [u i (σ i , s -j ) * σ -i (s -j ) * f -i (σ -i )]]<br />

= ∫ σ-i [u i (σ i , σ -i ) * f -i (σ -i )]<br />

= u i (σ i , f -i )


Corollary: u i (σ i , p*(σ -i |x)) = u i (σ i , p(σ -i |x))<br />

– p(σ -i ) denotes prior (probability distribution over<br />

mixed strategies) and p(σ -i |x) denote posterior given<br />

some observations x<br />

– p*(σ -i |x) is expectation of p(σ -i |x)<br />

• Theorem and corollary apply to normal-form<br />

and extensive-form (both perfect and imperfect<br />

<strong>in</strong>formation) for any number of players (can let<br />

σ -i be jo<strong>in</strong>t strategy profile for all other agents)


Meta-algorithm for <strong>Bayesian</strong><br />

opponent exploitation


Challenges<br />

• #1: Assumes we can compactly represent prior and<br />

posterior distributions p t , which have <strong>in</strong>f<strong>in</strong>ite doma<strong>in</strong>


Challenge #2<br />

• Requires procedure to efficiently compute posterior distributions<br />

given prior and observations, which will <strong>in</strong>volve hav<strong>in</strong>g to<br />

update potentially <strong>in</strong>f<strong>in</strong>itely-many strategies


#3<br />

Requires efficient procedure to compute expectation of p t


#4<br />

Requires that the full posterior distribution from one<br />

round be compactly represented to be used as the prior<br />

distribution <strong>in</strong> the next round


Can solve #4 by us<strong>in</strong>g the follow<strong>in</strong>g modification:<br />

p t


Dirichlet distribution<br />

• pdf of the Dirichlet distribution returns the belief that the<br />

probabilities of K rival events are x i given that each event<br />

has been observed α i - 1 times:<br />

– f(x, α) = [∏ xi αi-1 ] / B(α)<br />

• Normalization B(α) is beta function<br />

– B(α) = ∏ i Γ(α i )/Γ(∑ i α i ), where Γ(n) = (n-1)! is Gamma function<br />

• E[x i ] = α i / ∑ k α k<br />

• Assum<strong>in</strong>g mult<strong>in</strong>omial sampl<strong>in</strong>g, the posterior<br />

distribution after <strong>in</strong>clud<strong>in</strong>g new observations is also a<br />

Dirichlet distribution with parameters updated based on<br />

the new observations.


Dirichlet distribution<br />

• Very natural distribution, has been previously used for<br />

model<strong>in</strong>g <strong>in</strong> large imperfect-<strong>in</strong>formation games<br />

• Dirichlet is conjugate prior for mult<strong>in</strong>omial<br />

distribution, and therefore posterior is also Dirichlet<br />

– <strong>Opponent</strong> plays <strong>in</strong> proportion to updated weights<br />

• So simple closed form for expectation of posterior<br />

– Alg 1 gives exact efficient algorithm for comput<strong>in</strong>g <strong>Bayesian</strong><br />

Best Response [Fudenberg/Lev<strong>in</strong>e ‘98]<br />

– “Fictitious play” [Brown ‘51]<br />

• This applies to normal-form games and extensive-form<br />

games with perfect <strong>in</strong>formation<br />

– Zero-sum, general-sum, and any number of players


<strong>Imperfect</strong> <strong>in</strong>formation<br />

• It would also apply to imperfect-<strong>in</strong>formation games if<br />

the opponent’s private <strong>in</strong>formation was observed after<br />

each round (so we knew exactly what <strong>in</strong>formation set<br />

he took observed action from)<br />

• But not to imperfect-<strong>in</strong>formation games where<br />

opponent’s private <strong>in</strong>formation is not (or is only<br />

sometimes) observed.<br />

• Algorithm exists us<strong>in</strong>g importance sampl<strong>in</strong>g to<br />

approximate value of <strong>in</strong>f<strong>in</strong>ite <strong>in</strong>tegral [Southey et.al UAI ’05]<br />

– Has been applied to limit Texas hold ‘em successfully<br />

– But has no guarantees, and does not provide <strong>in</strong>tuition


• If we observe opponent’s hand after each play,<br />

we could just ma<strong>in</strong>ta<strong>in</strong> counter for each<br />

action/<strong>in</strong>fo set and update appropriate one<br />

• But if we don’t observe his card, we wouldn’t<br />

know which counter to <strong>in</strong>crement


• To simplify analysis assume we never see<br />

opponent’s card after a hand (and also assume<br />

we don’t observe our payoff until the end so that<br />

we could not draw <strong>in</strong>ferences about his card).<br />

• This is not realistic, but no known exact<br />

algorithms even for this sett<strong>in</strong>g<br />

– Suspect approach can extend to case of partial<br />

observability


• Let α Kb -1denote number of “fictitious” times<br />

we have observed opponent play b with K<br />

accord<strong>in</strong>g to our prior<br />

• Now assume we observe him take action b, but<br />

don’t observe his card


• Expectation of posterior for probability he bets<br />

with J:<br />

• [B(α Kb +1, α Kf )B(α Jb +1, α Jf ) + B(α Kb , α Kf )B(α Jb +2, α Jf )]/Z<br />

• Z = B(α Kb +1, α Kf )B(α Jb +1, α Jf ) + B(α Kb , α Kf )B(α Jb +2, α Jf )<br />

• + B(α Kb , α Kf +1)B(α Jb , α Jf +1) + B(α Kb , α Kf )B(α Jb , α Jf +2)<br />

• Recall B(α) = ∏ i Γ(α i )/Γ(∑ i α i ), where Γ(n) = (n-1)! is<br />

Gamma function


General solution<br />

• Assume we observe him play b θ b times and f θ f times<br />

• Expectation of posterior of probability of<br />

bett<strong>in</strong>g with Jack:<br />

• ∑ i ∑ j B(α Kb +i, α Kf +j) B(α Jb +θ b -i+1, α Jf +θ f -j) / Z<br />

• Z = ∑ i ∑ j [B(α Kb +i, α Kf +j) B(α Jb +θ b -i+1, α Jf +θ f -j)<br />

+ B(α Kb +i, α Kf +j) B(α Jb +θ b -i, α Jf +θ f -j+1)]


Example<br />

• Suppose prior is that opponent played b with K 10<br />

times, played f with K 3 times, played b with J 4 times,<br />

played f with J 9 times.<br />

• Now suppose we see him play b at next iteration<br />

• Previously we thought probability of bett<strong>in</strong>g with a<br />

jack was 4/13 = 0.308<br />

• Now: p(b|O,J) = B(11,3)B(5,9) + B(10,3)(6,9)/Z<br />

• p(f|O,J) = B(10,4)B(4,10) + B(10,3)(4,11)/Z<br />

• -> p(b|O,J) = p(b|O,J)/p(f|O,J) = …


• p(b|O,J) = 0.346<br />

• Previously we thought probability of bett<strong>in</strong>g<br />

with a jack was 4/13 = 0.308<br />

• What if we observed his card after game play<br />

and observed he had a jack?


• p(b|O,J) = 0.346<br />

• Previously we thought probability of bett<strong>in</strong>g<br />

with a jack was 4/13 = 0.308<br />

• What if we always observed his card after game<br />

play and observed he had a jack?<br />

– 5/14 = 0.357


• What about “naïve” approach where we<br />

<strong>in</strong>crement counter for α Jb by α Jb /(α Jb + α Kb )?


• p(b|O,J) = 0.346<br />

• Previously we thought probability of bett<strong>in</strong>g<br />

with a jack was 4/13 = 0.308<br />

• What if we always observed his card after game<br />

play and observed he had a jack?<br />

– 5/14 = 0.357<br />

• “Naïve” approach: (4 + 4/14)/14 = 0.306


Example<br />

• Suppose prior is that opponent played b with K 10<br />

times, played f with K 3 times, played b with J 4 times,<br />

played f with J 9 times.<br />

• Now suppose we see him play b at next iteration<br />

• Previously we thought probability of bett<strong>in</strong>g with a<br />

k<strong>in</strong>g was 10/13 = 0.769<br />

• Now: p(b|O,K) = B(5,9)B(11,3) + B(4,9)(12,3)/Z<br />

• p(f|O,K) = B(4,10)B(10,4) + B(4,9)(10,5)/Z<br />

• -> p(b|O,K) = p(b|O,K)/p(f|O,K) = …


• p(b|O,K) = 0.788<br />

• Previously we thought probability of bett<strong>in</strong>g<br />

with a k<strong>in</strong>g was 10/13 = 0.769<br />

• What if we always observed his card after game<br />

play and observed he had a k<strong>in</strong>g?


• p(b|O,K) = 0.788<br />

• Previously we thought probability of bett<strong>in</strong>g<br />

with a k<strong>in</strong>g was 10/13 = 0.769<br />

• What if we always observed his card after game<br />

play and observed he had a k<strong>in</strong>g?<br />

– 11/14 = 0.786<br />

– !!


• What about “naïve” approach where we<br />

<strong>in</strong>crement counter for α Kb by α Kb /(α Kb + α Jb )?


• p(b|O,K) = 0.788<br />

• Previously we thought probability of bett<strong>in</strong>g<br />

with a k<strong>in</strong>g was 10/13 = 0.769<br />

• What if we always observed his card after game<br />

play and observed he had a k<strong>in</strong>g?<br />

– 11/14 = 0.786<br />

• “Naïve” approach: (10 + 10/14)/14 = 0.765


Generalizations<br />

• Generalize model to n different states accord<strong>in</strong>g<br />

to arbitrary distribution π and can take m actions<br />

• Solution has O(n Tm ) terms<br />

– Polynomial <strong>in</strong> number of <strong>in</strong>formation states, but<br />

exponential <strong>in</strong> other parameters


Uniform prior over polyhedron


Conclusions and directions<br />

• First exact algorithm for <strong>Bayesian</strong> opponent<br />

exploitation <strong>in</strong> imperfect-<strong>in</strong>formation games<br />

• Observed <strong>in</strong>terest<strong>in</strong>g phenomena <strong>in</strong> natural game<br />

• Generalization is polynomial <strong>in</strong> number of <strong>in</strong>formation<br />

states, but exponential <strong>in</strong> other parameters<br />

• Future research and extensions:<br />

– Partial observability<br />

– General game trees with sequential actions<br />

– Any number of agents (alg not specialized for 2 pl zero-sum)<br />

– Other important distributions

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!