Goodness-of-fit tests for unbinned maximum likelihood - Physics ...

BELLEWhy we are interestedBelle & other exp’ts make increasing use of unbinned fits:e.g. CPV params (A ππ , S ππ ):• e + e − → Υ(4S) → BB• use ∆t ≈ ∆z/βγc; σ ∆z varies• S/B ∼ 1 =⇒ bin byflavour-tagging qualitycontinuum suppression variable• P i in L = ∏ i P i depends onfractions f ππ , f Kπ , f qq(f ππ + f Kπ + f qq = 1)• only 760 eventsbinning is impractical =⇒ UMLPHYSTAT 2003 G.o.f. tests for UML fits Yabsley & Kinoshita

BELLEWhy this presents a problem. . . the L value itself seems to have no powerto discriminate against a bad fitHeinrich’s analytic example:• data: δ(x − x 0 )• fit to 1 α exp(−x/α)• α floats (“compound hypothesis”)• fit finds α = x 0• −2 ln L max = 2N(1 + ln x 0 )• expectation for a true exponential,constant α, is 2N(1 + ln α)• =⇒ δ “looks like” expthe discrimination power ofthe L disappears when itsparameter(s) float(see also Kinoshita (2002))L depends on the aggregateP i of the points,not where they arePHYSTAT 2003 G.o.f. tests for UML fits Yabsley & Kinoshita

BELLEAn analytic study of the energy testHeinrich’s example is simple enough to implement on paperbut not for Aslan & Zech’s weight f ns ; I use R(x, x ′ ) = exp (−β |x − x ′ )|)Leave in the constant omittedby A&Z −→ E(exp) = 0For exp fitted to δ(x −α 1 ), aftersome tedious arithmetic . . .φ = 1 +1α 3 (α + β) − 2 α( exp(−β/α)(β − α))(1 − exp(−(β − α)/α)) + e−1(β + α)= self-repuls n of data + self-repuls n of th y − attract n of data and th yFor β ≫ α, φ → 1 + 1α 3 β − 2e−1αβ> 0PHYSTAT 2003 G.o.f. tests for UML fits Yabsley & Kinoshita

BELLEThe working exampleA linear distribution with a contamination thrown in . . .easy to discriminate harder to discriminate we pretend not to seeagainst fixed slope with floating slope the true distributionThis is the first of a graded set of examples leading up topseudo-B 0 → π 0 π 0 analysis . . .PHYSTAT 2003 G.o.f. tests for UML fits Yabsley & Kinoshita

BELLECorrelation between fitted param. and measurePHYSTAT 2003 G.o.f. tests for UML fits Yabsley & Kinoshita

PHYSTAT2003, September 8, 2003An Unbinned Goodness-of-Fit TestBased on the Random Walk•Test statisticflat distributionnull rejection powerflatteningcompound hyopthesis• multidimensional extension: speculationsKay KinoshitaUniversity of CincinnatiBelle Collaboration

Flat null hypothesisMC - generatedflat ensembleK 1distribution(N=10,100,1000)slope =–0.992±0.0100.0105%slope = –1/(decay constant)slope =slope =–1.008±0.0330.033–1.039±0.0490.049rejection: e.g. from Aslan & Zech hep-ex/0203010alternative f(X)=0.3+1.4X11.7% 82.4% 100%PHYSTAT2003,9/8/034

Null hypothesesFlat null: : alternative hypotheses from Aslan & Zechc 2 , 12 bins(N=100)~0.81~0.85~0.81Non-flat null• Generate according to fixed PDF,• Flatten " same " "• Check: ensemble distributionof K 1 is consistent withPHYSTAT2003,9/8/035

Compound hypotheses: rejection powerHypothesis: PDF = f(X; a)• Toy MC: generate expts for PDF= f(X; a 0 )fit i th expt to f(X; a)->a max,iflatten, null PDF = f(X; a max,i )find K 1 -> ensembleK 1 distributionAlternate hypothesis: PDF=g(X)• Toy MC: generate expts for PDF= g(X)fit j th expt to f(X; a)->a max,jflatten, null PDF = f(X; a max,j )find K 1 -> ensembleK 1 distributionPHYSTAT2003,9/8/036

Compound hypothesis: exampleHypothesis:Exponential,Float decay const(N=100,1000)Alternate:g(X)=2(1-X)cut28%(c 2 :13%)cut99%(c 2 :100%)c 2 : fit w. 20 binsPRELIMINARY: somewhat more powerful than c 2 for low NPHYSTAT2003,9/8/037

Compound hypothesis: data-parameter correlationThe issue:Some "goodness-of-fit" params found to correlatew. Data (fitted param) -> not GOF.a vs K 1 :no obvious correlationPHYSTAT2003,9/8/038

Compound hypothesis: K 1 distributionsome examplesObservations• still resemble exponential decay• "decay constant" varies w fitted function, parameter (notsimple as c 2 /ndf) BUT approx. consistent for N=10->1000!=> consistency over changes, in parameter, N.• very amenable to determination via toy MC9PHYSTAT2003,9/8/03

Treatment of multidimensional dataTo be developed• K 1 in 1-d, no ordering of data -> amenable to multi-d?Look at 2-d• Data on unit circle -> 2-d toroid in 4-d: what iscorresponding vector algebra?• More obvious: projections - extend Fourier formalismk 1 =0 or k 2 =0 corresponds to projection onto f 2 or f 1(k 1 ,k 2 )= (1,1), (1,–1), 1), etc. " other axesDevelop statistic from orthogonal projections?PHYSTAT2003,9/8/0310

Summary• binning-free goodness-of-fit test needed for unbinnedmaximum likelihood fits of Belle data- must accommodate compound hypotheses, multi-d data.• 2 studies:– Aslan/Zech energy test, extends naturally to multi-dexamined behavior under compound hypothesisanalytic, Monte Carlo– K 1 statistic, may extend to multivariateconsistency of distribution over differentparameter values, Ngenerality for HEP is good, power comparable to c 2PHYSTAT2003,9/8/0311

Goodness-of-fit tests for unbinned maximum likelihood - Physics ...

Create successful ePaper yourself

Delete template?

Save as template?