A Survey of Monte Carlo Tree Search Methods - Department of ...
A Survey of Monte Carlo Tree Search Methods - Department of ...
A Survey of Monte Carlo Tree Search Methods - Department of ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, VOL. 4, NO. 1, MARCH 2012 36<br />
Real-time Strategy (RTS) Games Numerous studies<br />
have been published that evaluate the performance <strong>of</strong><br />
MCTS on different variants <strong>of</strong> real-time strategy games.<br />
These games are usually modelled on well-known<br />
and commercially successful games such as Warcraft,<br />
Starcraft or Command & Conquer, but have been<br />
simplified to reduce the number <strong>of</strong> available actions<br />
at any moment in time (the branching factor <strong>of</strong> the<br />
decision trees in such games may be unlimited).<br />
Initial work made use <strong>of</strong> <strong>Monte</strong> <strong>Carlo</strong> simulations as<br />
a replacement for evaluation functions; the simulations<br />
were embedded in other algorithms such as minimax,<br />
or were used with a 1-ply look-ahead and made use<br />
<strong>of</strong> numerous abstractions to make the search feasible<br />
given the time constraints.<br />
Wargus Balla and Fern [18] apply UCT to a RTS game<br />
called Wargus. Here the emphasis is on tactical assault<br />
planning and making use <strong>of</strong> numerous abstractions,<br />
most notably the grouping <strong>of</strong> individual units. The<br />
authors conclude that MCTS is a promising approach:<br />
despite the lack <strong>of</strong> domain-specific knowledge, the<br />
algorithm outperformed baseline and human players<br />
across 12 scenarios.<br />
ORTS Naveed et al. [150] apply UCT and RRTs<br />
(4.10.3) to the RTS game engine ORTS. Both algorithms<br />
are used to find paths in the game and the authors<br />
conclude that UCT finds solutions with less search<br />
effort than RRT, although the RRT player outperforms<br />
the UCT player in terms <strong>of</strong> overall playing strength.<br />
7.7 Nondeterministic Games<br />
Nondeterministic games have hidden information<br />
and/or a random element. Hidden information may<br />
arise through cards or tiles visible to the player, but<br />
not the opponent(s). Randomness may arise through<br />
the shuffling <strong>of</strong> a deck <strong>of</strong> cards or the rolling <strong>of</strong> dice.<br />
Hidden information and randomness generally make<br />
game trees much harder to search, greatly increasing<br />
both their branching factor and depth.<br />
The most common approach to dealing with this<br />
increase in branching factor is to use determinization,<br />
which involves sampling over the perfect information<br />
game instances that arise when it is assumed that all<br />
hidden and random outcomes are known in advance<br />
(see Section 4.8.1).<br />
Skat is a trick-taking card game with a bidding<br />
phase. Schafer describes the UCT player XSKAT which<br />
uses information sets to handle the nondeterministic<br />
aspect <strong>of</strong> the game, and various optimisations in the<br />
default policy for both bidding and playing [194].<br />
XSKAT outperformed flat <strong>Monte</strong> <strong>Carlo</strong> players and was<br />
competitive with the best artificial Skat players that<br />
use traditional search techniques. A discussion <strong>of</strong> the<br />
methods used for opponent modelling is given in [35].<br />
Poker <strong>Monte</strong> <strong>Carlo</strong> approaches have also been used<br />
for the popular gambling card game Poker [177]. The<br />
poker game tree is too large to compute Nash strategies<br />
precisely, so states must be collected in a small number<br />
<strong>of</strong> buckets. <strong>Monte</strong> <strong>Carlo</strong> methods such as <strong>Monte</strong> <strong>Carlo</strong><br />
Counter Factual Regret (MCCFR) [125] (4.8.8) are<br />
then able to find approximate Nash equilibria. These<br />
approaches represent the current state <strong>of</strong> the art in<br />
computer Poker.<br />
Maîtrepierre et al. [137] use UCB to select strategies,<br />
resulting in global play that takes the opponent’s strategy<br />
into account and results in unpredictable behaviour.<br />
Van den Broeck et al. apply MCTS methods to multiplayer<br />
no-limit Texas Hold’em Poker [223], enabling<br />
strong exploitative behaviour against weaker rule-based<br />
opponents and competitive performance against experienced<br />
human opponents.<br />
Ponsen et al. [159] apply UCT to Poker, using a learned<br />
opponent model (Section 4.8.9) to bias the choice <strong>of</strong><br />
determinizations. Modelling the specific opponent by<br />
examining games they have played previously results<br />
in a large increase in playing strength compared to UCT<br />
with no opponent model. Veness et al. [226] describe<br />
the application <strong>of</strong> ρUCT (4.10.6) to Kuhn Poker using<br />
their MC-AIXA agent.<br />
Dou Di Zhu is a popular Chinese card game with<br />
hidden information. Whitehouse et al. [230] use<br />
information sets <strong>of</strong> states to store rollout statistics, in<br />
order to collect simulation statistics for sets <strong>of</strong> game<br />
states that are indistinguishable from a player’s point <strong>of</strong><br />
view. One surprising conclusion is that overcoming the<br />
problems <strong>of</strong> strategy fusion (by using expectimax rather<br />
than a determinization approach) is more beneficial<br />
than having a perfect opponent model.<br />
Other card games such as Hearts and Spades are also<br />
interesting to investigate in this area, although work to<br />
date has only applied MCTS to their perfect information<br />
versions [207].<br />
Klondike Solitaire is a well known single-player<br />
card game, which can be thought <strong>of</strong> as a single-player<br />
stochastic game: instead <strong>of</strong> the values <strong>of</strong> the hidden<br />
cards being fixed at the start <strong>of</strong> the game, they are<br />
determined by chance events at the moment the cards<br />
are turned over. 33<br />
Bjarnason et al. [20] apply a combination <strong>of</strong> the<br />
determinization technique <strong>of</strong> hindsight optimisation<br />
(HOP) with UCT to Klondike solitaire (Section 4.8.1).<br />
This system achieves a win rate more than twice that<br />
estimated for a human player.<br />
Magic: The Gathering is a top-selling two-player<br />
33. This idea can generally be used to transform a single-player<br />
game <strong>of</strong> imperfect information into one <strong>of</strong> perfect information with<br />
stochasticity.