A Survey of Monte Carlo Tree Search Methods - Department of ...
A Survey of Monte Carlo Tree Search Methods - Department of ...
A Survey of Monte Carlo Tree Search Methods - Department of ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, VOL. 4, NO. 1, MARCH 2012 26<br />
trees periodically and only on parts <strong>of</strong> the tree 21 [24].<br />
This is better suited to implementation in a messagepassing<br />
setting, where communication between processes<br />
is limited, e.g. when parallelising across clusters <strong>of</strong><br />
machines. Bourki et al. find that slow tree parallelisation<br />
slightly outperforms slow root parallelisation, despite<br />
the increased communication overheads <strong>of</strong> the former.<br />
The idea <strong>of</strong> periodically synchronising statistics between<br />
trees is also explored in [91].<br />
6.3.4 UCT-<strong>Tree</strong>split<br />
Schaefers and Platzner [192] describe an approach they<br />
call UCT-<strong>Tree</strong>split for performing a single MCTS search<br />
efficiently across multiple computer nodes. This allows<br />
an equal distribution <strong>of</strong> both the work and memory<br />
load among all computational nodes within distributed<br />
memory. Graf et al. [98] demonstrate the application<br />
<strong>of</strong> UCT-<strong>Tree</strong>split in their Go program GOMORRA to<br />
achieve high-level play. GOMORRA scales up to 16 nodes<br />
before diminishing returns reduce the benefit <strong>of</strong> splitting<br />
further, which they attribute to the high number <strong>of</strong><br />
simulations being computed in parallel.<br />
6.3.5 Threading and Synchronisation<br />
Cazenave and Jouandeau [48] describe a parallel Master-<br />
Slave algorithm for MCTS, and demonstrate consistent<br />
improvement with increasing parallelisation until 16<br />
slaves are reached. 22 The performance <strong>of</strong> their 9 × 9 Go<br />
program increases from 40.5% with one slave to 70.5%<br />
with 16 slaves against GNU GO 3.6.<br />
Enzenberger and Müller [80] describe an approach<br />
to multi-threaded MCTS that requires no locks, despite<br />
each thread working on the same tree. The results<br />
showed that this approach has much better scaling on<br />
multiple threads than a locked approach.<br />
Segal [195] investigates why the parallelisation <strong>of</strong><br />
MCTS across multiple machines has proven surprisingly<br />
difficult. He finds that there is an upper bound on the<br />
improvements from additional search in single-threaded<br />
scaling for FUEGO, that parallel speedup depends critically<br />
on how much time is given to each player, and that<br />
MCTS can scale nearly perfectly to at least 64 threads<br />
when combined with virtual loss, but without virtual<br />
loss scaling is limited to just eight threads.<br />
6.4 Considerations for Using Enhancements<br />
MCTS works well in some domains but not in others.<br />
The many enhancements described in this section and<br />
the previous one also have different levels <strong>of</strong> applicability<br />
to different domains. This section describes efforts to<br />
understand situations in which MCTS and its enhancements<br />
may or may not work, and what conditions might<br />
cause problems.<br />
21. For example, only on nodes above a certain depth or with more<br />
than a certain number <strong>of</strong> visits.<br />
22. At which point their algorithm is 14 times faster than its sequential<br />
counterpart.<br />
6.4.1 Consistency<br />
Heavily modified MCTS algorithms may lead to incorrect<br />
or undesirable behaviour as computational power<br />
increases. An example <strong>of</strong> this is a game played between<br />
the Go program MOGO and a human pr<strong>of</strong>essional,<br />
in which MOGO incorrectly deduced that it was in a<br />
winning position despite its opponent having a winning<br />
killer move, because that move matched a number <strong>of</strong><br />
very bad patterns so was not searched once [19]. Modifying<br />
MCTS enhancements to be consistent can avoid<br />
such problems without requiring that the entire search<br />
tree eventually be visited.<br />
6.4.2 Parameterisation <strong>of</strong> Game <strong>Tree</strong>s<br />
It has been observed that MCTS is successful for tricktaking<br />
card games, but less so for poker-like card games.<br />
Long et al. [130] define three measurable parameters <strong>of</strong><br />
game trees and show that these parameters support this<br />
view. These parameters could also feasibly be used to<br />
predict the success <strong>of</strong> MCTS on new games.<br />
6.4.3 Comparing Enhancements<br />
One issue with MCTS enhancements is how to measure<br />
their performance consistently. Many enhancements lead<br />
to an increase in computational cost which in turn results<br />
in fewer simulations per second; there is <strong>of</strong>ten a trade<strong>of</strong>f<br />
between using enhancements and performing more<br />
simulations.<br />
Suitable metrics for comparing approaches include:<br />
• Win rate against particular opponents.<br />
• Elo 23 ratings against other opponents.<br />
• Number <strong>of</strong> iterations per second.<br />
• Amount <strong>of</strong> memory used by the algorithm.<br />
Note that the metric chosen may depend on the reason<br />
for using a particular enhancement.<br />
7 APPLICATIONS<br />
Chess has traditionally been the focus <strong>of</strong> most AI games<br />
research and been described as the “drosophila <strong>of</strong> AI” as<br />
it had – until recently – been the standard yardstick for<br />
testing and comparing new algorithms [224]. The success<br />
<strong>of</strong> IBM’s DEEP BLUE against grandmaster Gary Kasparov<br />
has led to a paradigm shift away from computer Chess<br />
and towards computer Go. As a domain in which computers<br />
are not yet at the level <strong>of</strong> top human players, Go<br />
has become the new benchmark for AI in games [123].<br />
The most popular application <strong>of</strong> MCTS methods is to<br />
games and <strong>of</strong> these the most popular application is to<br />
Go; however, MCTS methods have broader use beyond<br />
games. This section summarises the main applications<br />
<strong>of</strong> MCTS methods in the literature, including computer<br />
Go, other games, and non-game domains.<br />
23. A method for calculating relative skill levels between players that<br />
is widely used for Chess and Go, named after Arpad Elo.