06.04.2013 Views

A Survey of Monte Carlo Tree Search Methods - Department of ...

A Survey of Monte Carlo Tree Search Methods - Department of ...

A Survey of Monte Carlo Tree Search Methods - Department of ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, VOL. 4, NO. 1, MARCH 2012 38<br />

Travelling Salesman Problem The Travelling Salesman<br />

Problem (TSP) is addressed in [168] using a nested <strong>Monte</strong><br />

<strong>Carlo</strong> search algorithm (4.9.2) with time windows. State <strong>of</strong><br />

the art solutions were reached up to 29 nodes, although<br />

performance on larger problems is less impressive.<br />

The Canadian Traveller Problem (CTP) is a variation <strong>of</strong><br />

the TSP in which some <strong>of</strong> the edges may be blocked<br />

with given probability. Bnaya et al. [22] propose several<br />

new policies and demonstrate the application <strong>of</strong> UCT to<br />

the CTP, achieving near-optimal results for some graphs.<br />

Sailing Domain Kocsis and Szepesvári [119] apply<br />

UCT to the sailing domain, which is a stochastic<br />

shortest path (SSP) problem that describes a sailboat<br />

searching for the shortest path between two points<br />

under fluctuating wind conditions. It was found that<br />

UCT scales better for increasing problem size than<br />

other techniques tried, including asynchronous realtime<br />

dynamic programming (ARTDP) and a Markov<br />

decision process called PG-ID based on online sampling.<br />

Physics Simulations Mansely et al. apply the<br />

Hierarchical Optimistic Optimisation applied to <strong>Tree</strong>s<br />

(HOOT) algorithm (5.1.4) to a number <strong>of</strong> physics<br />

problems [138]. These include the Double Integrator,<br />

Inverted Pendulum and Bicycle problems, for which<br />

they demonstrate the general superiority <strong>of</strong> HOOT over<br />

plain UCT.<br />

Function Approximation Coquelin and Munos [68]<br />

compare their BAST approach (4.2) with flat UCB for<br />

the approximation <strong>of</strong> Lipschitz functions, and observe<br />

that BAST outperforms flat UCB and is less dependent<br />

on the size <strong>of</strong> the search tree. BAST returns a good<br />

value quickly, and improves towards the optimal value<br />

as the computational budget is increased.<br />

Rimmel et al. [166] apply the MCTS-based Threshold<br />

Ascent for Graphs (TAG) method (4.10.2) to the problem<br />

<strong>of</strong> automatic performance tuning using DFT and FFT<br />

linear transforms in adaptive libraries. They demonstrate<br />

superior performance <strong>of</strong> TAG over standard optimisation<br />

methods.<br />

7.8.2 Constraint Satisfaction<br />

This sections lists applications <strong>of</strong> MCTS methods to<br />

constraint satisfaction problems.<br />

Constraint Problems Satomi et al. [187] proposed<br />

a real-time algorithm based on UCT to solve a quantified<br />

constraint satisfaction problems (QCSP). 36 Plain UCT<br />

did not solve their problems more efficiently than<br />

random selections, so Satomi et al. added a constraint<br />

propagation technique that allows the tree to focus in<br />

the most favourable parts <strong>of</strong> the search space. This<br />

combined algorithm outperforms the results obtained<br />

36. A QCSP is a constraint satisfaction problem in which some<br />

variables are universally quantified.<br />

by state <strong>of</strong> the art α-β search algorithms for large-scale<br />

problems [187].<br />

Previti et al. [160] investigate UCT approaches to the<br />

satisfiability <strong>of</strong> conjunctive normal form (CNF) problems.<br />

They find that their UCTSAT class <strong>of</strong> algorithms do<br />

not perform well if the domain being modelled has no<br />

underlying structure, but can perform very well if the<br />

information gathered on one iteration can successfully<br />

be applied on successive visits to the same node.<br />

Mathematical Expressions Cazenave [43] applied<br />

his nested <strong>Monte</strong> <strong>Carlo</strong> search method (4.9.2) to the<br />

generation <strong>of</strong> expression trees for the solution <strong>of</strong><br />

mathematical problems. He achieved better results than<br />

existing methods for the Prime generating polynomials<br />

problem 37 and a finite algebra problem called the A2<br />

primal algebra, for which a particular discriminator<br />

term must be found.<br />

7.8.3 Scheduling Problems<br />

Planning is also a domain in which <strong>Monte</strong> <strong>Carlo</strong> tree<br />

based techniques are <strong>of</strong>ten utilised, as described below.<br />

Benchmarks Nakhost and Müller apply their <strong>Monte</strong><br />

<strong>Carlo</strong> Random Walk (MRW) planner (4.10.7) to all <strong>of</strong> the<br />

supported domains from the 4th International Planning<br />

Competition (IPC-4) [149]. MRW shows promising<br />

results compared to the other planners tested, including<br />

FF, Marvin, YASHP and SG-Plan.<br />

Pellier et al. [158] combined UCT with heuristic<br />

search in their Mean-based Heuristic <strong>Search</strong> for anytime<br />

Planning (MHSP) method (4.10.8) to produce an anytime<br />

planner that provides partial plans before building a<br />

solution. The algorithm was tested on different classical<br />

benchmarks (Blocks World, Towers <strong>of</strong> Hanoi, Ferry<br />

and Gripper problems) and compared to some major<br />

planning algorithms (A*, IPP, SatPlan, SG Plan-5 and<br />

FDP). MHSP performed almost as well as classical<br />

algorithms on the problems tried, with some pros and<br />

cons. For example, MHSP is better than A* on the Ferry<br />

and Gripper problems but worse on Blocks World and<br />

the Towers <strong>of</strong> Hanoi.<br />

Printer Scheduling Matsumoto et al. [140] applied<br />

Single Player <strong>Monte</strong> <strong>Carlo</strong> <strong>Tree</strong> <strong>Search</strong> (4.4) to the<br />

game Bubble Breaker (7.4). Based on the good results<br />

obtained in this study, where the heuristics employed<br />

improved the quality <strong>of</strong> the solutions, the application <strong>of</strong><br />

this technique is proposed for a re-entrant scheduling<br />

problem, trying to manage the printing process <strong>of</strong> the<br />

auto-mobile parts supplier problem.<br />

Rock-Sample Problem Silver et al. [204] apply MCTS<br />

and UCT to the rock-sample problem (which simulates<br />

a Mars explorer robot that has to analyse and collect<br />

37. Finding a polynomial that generates as many different primes in<br />

a row as possible.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!