06.04.2013 Views

A Survey of Monte Carlo Tree Search Methods - Department of ...

A Survey of Monte Carlo Tree Search Methods - Department of ...

A Survey of Monte Carlo Tree Search Methods - Department of ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, VOL. 4, NO. 1, MARCH 2012 34<br />

state-space complexity <strong>of</strong> 10 159 [191].<br />

Schadd et al. describe the Single-Player MCTS (SP-<br />

MCTS) variant (4.4) featuring modified backpropagation,<br />

parameter tuning and meta-search extension, and apply<br />

it to SameGame [191] [190]. Their player achieved<br />

a higher score than any previous AI player (73,998).<br />

Cazenave then applied Nested <strong>Monte</strong> <strong>Carlo</strong> <strong>Search</strong><br />

(4.9.2) to achieve an even higher score <strong>of</strong> 77,934 [42].<br />

Matsumoto et al. later applied SP-MCTS with<br />

domain knowledge to bias move choices during<br />

playouts, for superior performance with little impact<br />

on computational time [140]. Edelkamp et al. [78]<br />

achieved a score <strong>of</strong> 82,604 using enhanced UCT in their<br />

heuristically guided swarm tree search (4.9.5).<br />

Sudoku and Kakuro Sudoku, the popular logic puzzle,<br />

needs no introduction except perhaps to point out that<br />

it is NP-complete for arbitrarily large boards. Cazenave<br />

[42] applied nested <strong>Monte</strong> <strong>Carlo</strong> search (4.9.2) to 16 × 16<br />

Sudoku as the standard 9 × 9 puzzle proved too easy for<br />

comparison purposes and reported solution rates over<br />

300,000 times faster than existing Forward Checking<br />

methods and almost 50 times faster than existing<br />

Iterative Sampling approaches.<br />

Kakuro, also known as Cross Sums, is a similar<br />

logic puzzle in the same class as Sudoku that is also<br />

NP-complete. Cazenave [41] applied nested <strong>Monte</strong><br />

<strong>Carlo</strong> search (4.9.2) to 8 × 8 Kakaru puzzles for solution<br />

rates over 5,000 times faster than existing Forward<br />

Checking and Iterative Sampling approaches.<br />

Wumpus World Asmuth and Littman [9] apply<br />

their Bayesian FSSS (BFS3) technique to the classic 4x4<br />

video game Wumpus World [178]. Their BFS3 player<br />

clearly outperformed a variance-based reward bonus<br />

strategy, approaching Bayes-optimality as the program’s<br />

computational budget was increased.<br />

Mazes, Tigers and Grids Veness et al. [226] describe the<br />

application <strong>of</strong> ρUCT (4.10.6) in their MC-AIXA agent<br />

to a range <strong>of</strong> puzzle games including:<br />

• maze games,<br />

• Tiger games in which the player must select the door<br />

that maximises some reward, and<br />

• a 4 × 4 grid world game in which the player moves<br />

and teleports to maximise their score.<br />

Veness et al. [226] also describe the application <strong>of</strong> ρUCT<br />

to a number <strong>of</strong> nondeterministic games, which are summarised<br />

in a following section.<br />

7.5 General Game Playing<br />

General Game Players (GGPs) are s<strong>of</strong>tware agents intended<br />

to play a range <strong>of</strong> games well rather than any<br />

single game expertly. Such systems move more <strong>of</strong> the<br />

mental work from the human to the machine: while<br />

the programmer may fine-tune a dedicated single-game<br />

agent to a high level <strong>of</strong> performance based on their<br />

knowledge <strong>of</strong> the game, GGPs must find good solutions<br />

to a range <strong>of</strong> previously unseen problems. This is more in<br />

keeping with the original aims <strong>of</strong> AI research to produce<br />

truly intelligent automata that can perform well when<br />

faced with complex real-world problems.<br />

GGP is another arena that MCTS-based agents have<br />

dominated since their introduction several years ago.<br />

The use <strong>of</strong> random simulations to estimate move values<br />

is well suited to this domain, where heuristic knowledge<br />

is not available for each given game.<br />

CADIAPLAYER was the first MCTS-based GGP player,<br />

developed by Hilmar Finnsson for his Masters Thesis<br />

in 2007 [83]. The original incarnation <strong>of</strong> CADIAPLAYER<br />

used a form <strong>of</strong> history heuristic and parallelisation<br />

to improve performance, but otherwise used no<br />

enhancements such as heavy playouts. Finnsson and<br />

Björnsson point out the suitability <strong>of</strong> UCT for GGP<br />

as random simulations implicitly capture, in real-time,<br />

game properties that would be difficult to explicitly<br />

learn and express in a heuristic evaluation function [84].<br />

They demonstrate the clear superiority <strong>of</strong> their UCT<br />

approach over flat MC. CADIAPLAYER went on to win<br />

the 2007 and 2008 AAAI GGP competitions [21].<br />

Finnsson and Björnsson added a number <strong>of</strong><br />

enhancements for CADIAPLAYER, including the Move-<br />

Average Sampling Technique (MAST; 6.1.4), <strong>Tree</strong>-Only<br />

MAST (TO-MAST; 6.1.4), Predicate-Average Sampling<br />

Technique (PAST; 6.1.4) and RAVE (5.3.5), and found<br />

that each improved performance for some games, but<br />

no combination proved generally superior [85]. Shortly<br />

afterwards, they added the Features-to-Action Sampling<br />

Technique (FAST; 6.1.4), in an attempt to identify<br />

common board game features using template matching<br />

[86]. CADIAPLAYER did not win the 2009 or 2010 AAAI<br />

GGP competitions, but a general increase in playing<br />

strength was noted as the program was developed over<br />

these years [87].<br />

ARY is another MCTS-based GGP player, which<br />

uses nested <strong>Monte</strong> <strong>Carlo</strong> search (4.9.2) and transposition<br />

tables (5.2.4 and 6.2.4), in conjunction with UCT, to select<br />

moves [144]. Early development <strong>of</strong> ARY is summarised<br />

in [142] and [143]. ARY came third in the 2007 AAAI<br />

GGP competition [143] and won the 2009 [145] and 2010<br />

competitions to become world champion. Méhat and<br />

Cazenave demonstrate the benefits <strong>of</strong> tree parallelisation<br />

(6.3.3) for GPP, for which playouts can be slow as games<br />

must typically be interpreted [146].<br />

Other GGPs Möller et al. [147] describe their programme<br />

CENTURIO which combines MCTS with Answer Set<br />

Programming (ASP) to play general games. CENTURIO<br />

came fourth in the 2009 AAAI GGP competition.<br />

Sharma et al. [197] describe domain-independent<br />

methods for generating and evolving domain-specific<br />

knowledge using both state and move patterns, to improve<br />

convergence rates for UCT in general games

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!