06.04.2013 Views

A Survey of Monte Carlo Tree Search Methods - Department of ...

A Survey of Monte Carlo Tree Search Methods - Department of ...

A Survey of Monte Carlo Tree Search Methods - Department of ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, VOL. 4, NO. 1, MARCH 2012 26<br />

trees periodically and only on parts <strong>of</strong> the tree 21 [24].<br />

This is better suited to implementation in a messagepassing<br />

setting, where communication between processes<br />

is limited, e.g. when parallelising across clusters <strong>of</strong><br />

machines. Bourki et al. find that slow tree parallelisation<br />

slightly outperforms slow root parallelisation, despite<br />

the increased communication overheads <strong>of</strong> the former.<br />

The idea <strong>of</strong> periodically synchronising statistics between<br />

trees is also explored in [91].<br />

6.3.4 UCT-<strong>Tree</strong>split<br />

Schaefers and Platzner [192] describe an approach they<br />

call UCT-<strong>Tree</strong>split for performing a single MCTS search<br />

efficiently across multiple computer nodes. This allows<br />

an equal distribution <strong>of</strong> both the work and memory<br />

load among all computational nodes within distributed<br />

memory. Graf et al. [98] demonstrate the application<br />

<strong>of</strong> UCT-<strong>Tree</strong>split in their Go program GOMORRA to<br />

achieve high-level play. GOMORRA scales up to 16 nodes<br />

before diminishing returns reduce the benefit <strong>of</strong> splitting<br />

further, which they attribute to the high number <strong>of</strong><br />

simulations being computed in parallel.<br />

6.3.5 Threading and Synchronisation<br />

Cazenave and Jouandeau [48] describe a parallel Master-<br />

Slave algorithm for MCTS, and demonstrate consistent<br />

improvement with increasing parallelisation until 16<br />

slaves are reached. 22 The performance <strong>of</strong> their 9 × 9 Go<br />

program increases from 40.5% with one slave to 70.5%<br />

with 16 slaves against GNU GO 3.6.<br />

Enzenberger and Müller [80] describe an approach<br />

to multi-threaded MCTS that requires no locks, despite<br />

each thread working on the same tree. The results<br />

showed that this approach has much better scaling on<br />

multiple threads than a locked approach.<br />

Segal [195] investigates why the parallelisation <strong>of</strong><br />

MCTS across multiple machines has proven surprisingly<br />

difficult. He finds that there is an upper bound on the<br />

improvements from additional search in single-threaded<br />

scaling for FUEGO, that parallel speedup depends critically<br />

on how much time is given to each player, and that<br />

MCTS can scale nearly perfectly to at least 64 threads<br />

when combined with virtual loss, but without virtual<br />

loss scaling is limited to just eight threads.<br />

6.4 Considerations for Using Enhancements<br />

MCTS works well in some domains but not in others.<br />

The many enhancements described in this section and<br />

the previous one also have different levels <strong>of</strong> applicability<br />

to different domains. This section describes efforts to<br />

understand situations in which MCTS and its enhancements<br />

may or may not work, and what conditions might<br />

cause problems.<br />

21. For example, only on nodes above a certain depth or with more<br />

than a certain number <strong>of</strong> visits.<br />

22. At which point their algorithm is 14 times faster than its sequential<br />

counterpart.<br />

6.4.1 Consistency<br />

Heavily modified MCTS algorithms may lead to incorrect<br />

or undesirable behaviour as computational power<br />

increases. An example <strong>of</strong> this is a game played between<br />

the Go program MOGO and a human pr<strong>of</strong>essional,<br />

in which MOGO incorrectly deduced that it was in a<br />

winning position despite its opponent having a winning<br />

killer move, because that move matched a number <strong>of</strong><br />

very bad patterns so was not searched once [19]. Modifying<br />

MCTS enhancements to be consistent can avoid<br />

such problems without requiring that the entire search<br />

tree eventually be visited.<br />

6.4.2 Parameterisation <strong>of</strong> Game <strong>Tree</strong>s<br />

It has been observed that MCTS is successful for tricktaking<br />

card games, but less so for poker-like card games.<br />

Long et al. [130] define three measurable parameters <strong>of</strong><br />

game trees and show that these parameters support this<br />

view. These parameters could also feasibly be used to<br />

predict the success <strong>of</strong> MCTS on new games.<br />

6.4.3 Comparing Enhancements<br />

One issue with MCTS enhancements is how to measure<br />

their performance consistently. Many enhancements lead<br />

to an increase in computational cost which in turn results<br />

in fewer simulations per second; there is <strong>of</strong>ten a trade<strong>of</strong>f<br />

between using enhancements and performing more<br />

simulations.<br />

Suitable metrics for comparing approaches include:<br />

• Win rate against particular opponents.<br />

• Elo 23 ratings against other opponents.<br />

• Number <strong>of</strong> iterations per second.<br />

• Amount <strong>of</strong> memory used by the algorithm.<br />

Note that the metric chosen may depend on the reason<br />

for using a particular enhancement.<br />

7 APPLICATIONS<br />

Chess has traditionally been the focus <strong>of</strong> most AI games<br />

research and been described as the “drosophila <strong>of</strong> AI” as<br />

it had – until recently – been the standard yardstick for<br />

testing and comparing new algorithms [224]. The success<br />

<strong>of</strong> IBM’s DEEP BLUE against grandmaster Gary Kasparov<br />

has led to a paradigm shift away from computer Chess<br />

and towards computer Go. As a domain in which computers<br />

are not yet at the level <strong>of</strong> top human players, Go<br />

has become the new benchmark for AI in games [123].<br />

The most popular application <strong>of</strong> MCTS methods is to<br />

games and <strong>of</strong> these the most popular application is to<br />

Go; however, MCTS methods have broader use beyond<br />

games. This section summarises the main applications<br />

<strong>of</strong> MCTS methods in the literature, including computer<br />

Go, other games, and non-game domains.<br />

23. A method for calculating relative skill levels between players that<br />

is widely used for Chess and Go, named after Arpad Elo.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!