06.04.2013 Views

A Survey of Monte Carlo Tree Search Methods - Department of ...

A Survey of Monte Carlo Tree Search Methods - Department of ...

A Survey of Monte Carlo Tree Search Methods - Department of ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, VOL. 4, NO. 1, MARCH 2012 36<br />

Real-time Strategy (RTS) Games Numerous studies<br />

have been published that evaluate the performance <strong>of</strong><br />

MCTS on different variants <strong>of</strong> real-time strategy games.<br />

These games are usually modelled on well-known<br />

and commercially successful games such as Warcraft,<br />

Starcraft or Command & Conquer, but have been<br />

simplified to reduce the number <strong>of</strong> available actions<br />

at any moment in time (the branching factor <strong>of</strong> the<br />

decision trees in such games may be unlimited).<br />

Initial work made use <strong>of</strong> <strong>Monte</strong> <strong>Carlo</strong> simulations as<br />

a replacement for evaluation functions; the simulations<br />

were embedded in other algorithms such as minimax,<br />

or were used with a 1-ply look-ahead and made use<br />

<strong>of</strong> numerous abstractions to make the search feasible<br />

given the time constraints.<br />

Wargus Balla and Fern [18] apply UCT to a RTS game<br />

called Wargus. Here the emphasis is on tactical assault<br />

planning and making use <strong>of</strong> numerous abstractions,<br />

most notably the grouping <strong>of</strong> individual units. The<br />

authors conclude that MCTS is a promising approach:<br />

despite the lack <strong>of</strong> domain-specific knowledge, the<br />

algorithm outperformed baseline and human players<br />

across 12 scenarios.<br />

ORTS Naveed et al. [150] apply UCT and RRTs<br />

(4.10.3) to the RTS game engine ORTS. Both algorithms<br />

are used to find paths in the game and the authors<br />

conclude that UCT finds solutions with less search<br />

effort than RRT, although the RRT player outperforms<br />

the UCT player in terms <strong>of</strong> overall playing strength.<br />

7.7 Nondeterministic Games<br />

Nondeterministic games have hidden information<br />

and/or a random element. Hidden information may<br />

arise through cards or tiles visible to the player, but<br />

not the opponent(s). Randomness may arise through<br />

the shuffling <strong>of</strong> a deck <strong>of</strong> cards or the rolling <strong>of</strong> dice.<br />

Hidden information and randomness generally make<br />

game trees much harder to search, greatly increasing<br />

both their branching factor and depth.<br />

The most common approach to dealing with this<br />

increase in branching factor is to use determinization,<br />

which involves sampling over the perfect information<br />

game instances that arise when it is assumed that all<br />

hidden and random outcomes are known in advance<br />

(see Section 4.8.1).<br />

Skat is a trick-taking card game with a bidding<br />

phase. Schafer describes the UCT player XSKAT which<br />

uses information sets to handle the nondeterministic<br />

aspect <strong>of</strong> the game, and various optimisations in the<br />

default policy for both bidding and playing [194].<br />

XSKAT outperformed flat <strong>Monte</strong> <strong>Carlo</strong> players and was<br />

competitive with the best artificial Skat players that<br />

use traditional search techniques. A discussion <strong>of</strong> the<br />

methods used for opponent modelling is given in [35].<br />

Poker <strong>Monte</strong> <strong>Carlo</strong> approaches have also been used<br />

for the popular gambling card game Poker [177]. The<br />

poker game tree is too large to compute Nash strategies<br />

precisely, so states must be collected in a small number<br />

<strong>of</strong> buckets. <strong>Monte</strong> <strong>Carlo</strong> methods such as <strong>Monte</strong> <strong>Carlo</strong><br />

Counter Factual Regret (MCCFR) [125] (4.8.8) are<br />

then able to find approximate Nash equilibria. These<br />

approaches represent the current state <strong>of</strong> the art in<br />

computer Poker.<br />

Maîtrepierre et al. [137] use UCB to select strategies,<br />

resulting in global play that takes the opponent’s strategy<br />

into account and results in unpredictable behaviour.<br />

Van den Broeck et al. apply MCTS methods to multiplayer<br />

no-limit Texas Hold’em Poker [223], enabling<br />

strong exploitative behaviour against weaker rule-based<br />

opponents and competitive performance against experienced<br />

human opponents.<br />

Ponsen et al. [159] apply UCT to Poker, using a learned<br />

opponent model (Section 4.8.9) to bias the choice <strong>of</strong><br />

determinizations. Modelling the specific opponent by<br />

examining games they have played previously results<br />

in a large increase in playing strength compared to UCT<br />

with no opponent model. Veness et al. [226] describe<br />

the application <strong>of</strong> ρUCT (4.10.6) to Kuhn Poker using<br />

their MC-AIXA agent.<br />

Dou Di Zhu is a popular Chinese card game with<br />

hidden information. Whitehouse et al. [230] use<br />

information sets <strong>of</strong> states to store rollout statistics, in<br />

order to collect simulation statistics for sets <strong>of</strong> game<br />

states that are indistinguishable from a player’s point <strong>of</strong><br />

view. One surprising conclusion is that overcoming the<br />

problems <strong>of</strong> strategy fusion (by using expectimax rather<br />

than a determinization approach) is more beneficial<br />

than having a perfect opponent model.<br />

Other card games such as Hearts and Spades are also<br />

interesting to investigate in this area, although work to<br />

date has only applied MCTS to their perfect information<br />

versions [207].<br />

Klondike Solitaire is a well known single-player<br />

card game, which can be thought <strong>of</strong> as a single-player<br />

stochastic game: instead <strong>of</strong> the values <strong>of</strong> the hidden<br />

cards being fixed at the start <strong>of</strong> the game, they are<br />

determined by chance events at the moment the cards<br />

are turned over. 33<br />

Bjarnason et al. [20] apply a combination <strong>of</strong> the<br />

determinization technique <strong>of</strong> hindsight optimisation<br />

(HOP) with UCT to Klondike solitaire (Section 4.8.1).<br />

This system achieves a win rate more than twice that<br />

estimated for a human player.<br />

Magic: The Gathering is a top-selling two-player<br />

33. This idea can generally be used to transform a single-player<br />

game <strong>of</strong> imperfect information into one <strong>of</strong> perfect information with<br />

stochasticity.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!