11.07.2015 Views

Monte-Carlo Tree Search and Billiards - VideoLectures

Monte-Carlo Tree Search and Billiards - VideoLectures

Monte-Carlo Tree Search and Billiards - VideoLectures

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

European Conference in Machine Learning 2009Active Learning is a Game:<strong>Monte</strong>-<strong>Carlo</strong> <strong>Tree</strong> <strong>Search</strong> <strong>and</strong><strong>Billiards</strong>P. Rolet, ML&O Team, Paris-Sud UniversityJoint work with M. Sebag <strong>and</strong> O. Teytaud


Why do we bother?Machine Learning tackles a wide range ofproblemsFor some of them, data is expensiveLabel = $$A supervised learning problem


A major Machine Learning goalReduce sample complexitywhile keeping generalization error lowMotivating application: numerical engineering=> Learn simplified models with only ~ 100 examples


That's what Active Learning does!Learning a thresholdPASSIVE:ACTIVE:Exponential improvementFreund et. Al. 97,Dasgupta 04,05, ...This can be generalized (cf <strong>Billiards</strong> later on)


How AL typically worksFind a way to measure ”information” brought byinstancesGreedily choose the most informative instancesExamples:version space split,Query-by-Committee(classification)Seung et al. 97=> Good, but notoptimal


A different perspectiveActive learning can be seen as a gameplays x1answers h*(x1)plays x2answers h*(x2)Learner(with learningstrategy S). . .T-size training set S T(h*){(x1,h*(x1)), ... , (xT,h*(xT))}T= Finite HorizonTarget Concept h*(a.k.a. Oracle)Score: Generalization ErrorThis is a Reinforcement Learning problem


Train the Active Learning PlayerInspiration from GoCoulom 06, Chaslot et al. 06,Gelly&Sliver 07Explore the gametree: MCTSValues of movesestimated by <strong>Monte</strong>-<strong>Carlo</strong> simulationsAL: Train againstsurrogate hypotheses


<strong>Monte</strong>-<strong>Carlo</strong> & UCT for gamesSimulation planningMulti-armed b<strong>and</strong>itsAssess values ofchild moves:asymetric tree growthExplore more themoves with bettervalue: UCT=> Baal Algorithm(B<strong>and</strong>it-based ActiveLearning)


UCT: Exploration Vs ExploitationUCB: balanceexploration <strong>and</strong>exploitationAuer 03UCT = UCB for treesKocsis&Szepesvari 06


The Baal Algorithm


The Baal AlgorithmValue function for states converges to=> Optimal Strategy(Proof based on the Markov DecisionProcess model)BUT:- Infinite action space- How to drawsurrogatehypotheses?


Baal: Infinite action spaceUCB is for finite action spacesHere, action space = R DProgressive widening: add instances as thenumber of simulation grows# instances ~ (# visits) ¼Coulom 07➢In a r<strong>and</strong>om order➢ In an educated orderAllows coupling with existing AL criteriasuch as VS split


Baal: draw surrogate hypothesesBilliard algorithmsRujan 97,Comets et. al. 09, ...Constraints = labeledinstancesPoint = hypothesisDomain = versionspaceSound: provably converges to uniform drawScalable w.r.t. dimension, # constraints


Some resultsPassive learningSetting:- Linear sep. of R D- Dimension : 4, 8- # queries: 15, 20X-axis: log(# sims)Y-axis: Gen. ErrorAlmost optimal AL(QbC-based)D=4, # =15D=8, # =20


Some results - 2Combining with ALcriteria (inspired fromQbC)Best of both worlds!D=4, # =15Almost optimal AL(QbC-based)D=8, # =20


To sum up ...A new approach to Active LearningAL as a GameAn approximation of the optimal strategy(provably)An Anytime algorithmPerspectives:- Kernelized Baal- Numerical engineeringapplication


Thanks for listening

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!