Monte-Carlo Tree Search and Billiards - VideoLectures

European Conference in Machine Learning 2009Active Learning is a Game:Monte-Carlo Tree Search andBilliardsP. Rolet, ML&O Team, Paris-Sud UniversityJoint work with M. Sebag and O. Teytaud

Why do we bother?Machine Learning tackles a wide range ofproblemsFor some of them, data is expensiveLabel = $$A supervised learning problem

A major Machine Learning goalReduce sample complexitywhile keeping generalization error lowMotivating application: numerical engineering=> Learn simplified models with only ~ 100 examples

That's what Active Learning does!Learning a thresholdPASSIVE:ACTIVE:Exponential improvementFreund et. Al. 97,Dasgupta 04,05, ...This can be generalized (cf Billiards later on)

How AL typically worksFind a way to measure ”information” brought byinstancesGreedily choose the most informative instancesExamples:version space split,Query-by-Committee(classification)Seung et al. 97=> Good, but notoptimal

A different perspectiveActive learning can be seen as a gameplays x1answers h*(x1)plays x2answers h*(x2)Learner(with learningstrategy S). . .T-size training set S T(h*){(x1,h*(x1)), ... , (xT,h*(xT))}T= Finite HorizonTarget Concept h*(a.k.a. Oracle)Score: Generalization ErrorThis is a Reinforcement Learning problem

Train the Active Learning PlayerInspiration from GoCoulom 06, Chaslot et al. 06,Gelly&Sliver 07Explore the gametree: MCTSValues of movesestimated by Monte-Carlo simulationsAL: Train againstsurrogate hypotheses

Monte-Carlo & UCT for gamesSimulation planningMulti-armed banditsAssess values ofchild moves:asymetric tree growthExplore more themoves with bettervalue: UCT=> Baal Algorithm(Bandit-based ActiveLearning)

UCT: Exploration Vs ExploitationUCB: balanceexploration andexploitationAuer 03UCT = UCB for treesKocsis&Szepesvari 06

The Baal Algorithm

The Baal AlgorithmValue function for states converges to=> Optimal Strategy(Proof based on the Markov DecisionProcess model)BUT:- Infinite action space- How to drawsurrogatehypotheses?

Baal: Infinite action spaceUCB is for finite action spacesHere, action space = R DProgressive widening: add instances as thenumber of simulation grows# instances ~ (# visits) ¼Coulom 07➢In a random order➢ In an educated orderAllows coupling with existing AL criteriasuch as VS split

Baal: draw surrogate hypothesesBilliard algorithmsRujan 97,Comets et. al. 09, ...Constraints = labeledinstancesPoint = hypothesisDomain = versionspaceSound: provably converges to uniform drawScalable w.r.t. dimension, # constraints

Some resultsPassive learningSetting:- Linear sep. of R D- Dimension : 4, 8- # queries: 15, 20X-axis: log(# sims)Y-axis: Gen. ErrorAlmost optimal AL(QbC-based)D=4, # =15D=8, # =20

Some results - 2Combining with ALcriteria (inspired fromQbC)Best of both worlds!D=4, # =15Almost optimal AL(QbC-based)D=8, # =20

To sum up ...A new approach to Active LearningAL as a GameAn approximation of the optimal strategy(provably)An Anytime algorithmPerspectives:- Kernelized Baal- Numerical engineeringapplication

Thanks for listening

Monte-Carlo Tree Search and Billiards - VideoLectures

Create successful ePaper yourself

Delete template?

Save as template?