P-values and Stability Selection - Seminar fÃ¼r Statistik - ETH ZÃ¼rich

Statistics for high-dimensional data:P-values and Stability SelectionPeter Bühlmann and Sara van de GeerSeminar für Statistik, ETH ZürichMay 2012

P-values for high-dimensional linear modelsY = Xβ 0 + εgoal: statistical hypothesis testingH 0,j : β 0 j = 0 or H 0,G : β 0 j = 0 for all j ∈ Gbackground: if we could handle the asymptotic distribution ofthe Lasso ˆβ(λ) under the null-hypothesis❀ could construct p-valuesthis is very difficult!asymptotic distribution of ˆβ has some point mass at zero,...Knight and Fu (2000) for p < ∞ and n → ∞;Montanari (2010, 2012) ... for random design with i.i.d. columnsnot practical at the moment

Variable selectionExample: Motif regressionfor finding HIF1α transcription factor binding sites in DNA seq.Müller, Meier, PB & RicciY i ∈ R: univariate response measuring binding intensity ofHIF1α on coarse DNA segment i (from CHIP-chip experiments)X i = (X (1)i, . . . , X (p)i) ∈ R p :X (j)i= abundance score of candidate motif j in DNA segment i(using sequence data and computational biology algorithms,e.g. MDSCAN)

question: relation between the binding intensity Y and theabundance of short candidate motifs?❀ linear model is often reasonable“motif regression” (Conlon, X.S. Liu, Lieb & J.S. Liu, 2003)Y = Xβ + ɛ, n = 287, p = 195goal: variable selection❀ find the relevant motifs among the p = 195 candidates

Motif regressionfor finding HIF1α transcription factor binding sites in DNA seq.Y i ∈ R: univariate response measuring binding intensity oncoarse DNA segment i (from CHIP-chip experiments)= abundance score of candidate motif j in DNA segment iX (j)ivariable selection in linear model Y i = β 0 +i = 1, . . . , n = 287, p = 195❀ Lasso selects 26 covariates and R 2 ≈ 50%i.e. 26 interesting candidate motifsp∑j=1β j X (j)i+ ε i ,

motif regression: estimated coefficients ˆβ(ˆλ CV )original datacoefficients0.00 0.05 0.10 0.15 0.200 50 100 150 200variableswhich variables in Ŝ are false positives?(p-values would be very useful!)

(Multi) Sample splittingwhich was an early but sub-ideal proposal◮ select variables on first half of the sample ❀ Ŝ◮ compute OLS for variables in Ŝ on second half of thesample❀ P-values P j based on Gaussian linear modelif j ∈ Ŝ :P j from t-statisticsif j /∈ Ŝ : P j = 1 (i.e. if ˆβ (j) = 0)Bonferroni-“style” corrected P-values:P corr,j = min(P j · |Ŝ|, 1)❀ (conserv.) familywise error control withP corr,j (j = 1, . . . , p)(Wasserman & Roeder, 2008)

this is a “P-value lottery”motif regression example: p = 195, n = 287adjusted P-values for same important variableover different random sample-splitsFREQUENCY0 20 40 60 800.0 0.2 0.4 0.6 0.8 1.0ADJUSTED P−VALUE❀ improve by aggregating over many sample-splits(which improves “efficiency” as well)

this is a “P-value lottery”motif regression example: p = 195, n = 287adjusted P-values for same important variableover different random sample-splitsFREQUENCY0 20 40 60 800.0 0.2 0.4 0.6 0.8 1.0ADJUSTED P−VALUE❀ improve by aggregating over many sample-splits(which improves “efficiency” as well)

Sample splitting multiple timesrun the sample-splitting procedure B times and do a non-trivialaggregation of p-valuesP-values: P (1)j, . . . , P (B)jgoal:aggregation of P (1)j, . . . , P (B)jto a single P-value P final,jproblem: dependence among P (1)j, . . . , P (B)j

defineQ (j) (γ) =q γ (P (j)}{{}corr,b/γ; b = 1, . . . B)emp. γ-quantile fct.e.g: γ = 1/2, aggregation with the median❀ (conserv.) familywise error control for any fixed value of γwhat is the best γ? it really matters❀ can “search” for it an correct with an additional factor

“adaptively” aggregated P-value:P (j)final = (1 − log(γ min)) · infγ∈(γ min ,1) Q(j) (γ)Q (j) (γ) = q γ (P (j)corr,b/γ; b = 1, . . . B)❀ reject H (j)0: β j = 0 ⇐⇒ P (j)final ≤ αequals roughly a raw P-value based on sample size ⌊n/2⌋,multiplied byP (j)finala factor ≈ (5 − 10) · |Ŝ|(which is to be compared with p)

for familywise error rate (FWER) =P[at least one false positive selection]TheoremConsider Gaussian linear model (with fixed design) andassume:◮ lim n→∞ P[Ŝ ⊇ S 0] = 1 screening property◮|Ŝ| < ⌊n/2⌋ sparsity propertyThen:strong control for either familywise error rate or for falsediscovery rate

motif regression examplep = 195, n = 287motif regression●coefficients0.00 0.05 0.10 0.15 0.20●●●● ●●●●●● ●●● ● ●●● ● ● ● ● ●●● ●●●●●●●●●0 50 100 150 200variables◦: variable/motif with FWER-adjusted p-value 0.006◦: p-value clearly larger than 0.05(this variable corresponds to known true motif)

discussion: multi sample splitting◮ assumes P[Ŝ ⊇ S 0] → 1❀ requires the beta-min conditionsuch an assumption should be avoided for hypothesistesting(because this is the essence of the question in testingwhether βj0 is smallish or sufficiently large)◮ necessarily requires design conditions; but this isunavoidable

Stability Selection (Meinshausen & PB, 2010)which allows to go way beyond linear modelsselection of “features” from the set {1, . . . , p}, e.g.:◮ variable selection in regression or classification◮ edge selection in a graph◮ membership to a cluster◮ ...selection procedureŜ λ ⊆ {1, . . . , p},λ a tuning parameterprime example: Lasso for selecting variables in a linear model

subsampling:◮ draw sub-sample of size ⌊n/2⌋ without replacement,denoted by I ∗ ⊆ {1, . . . , n}, |I ∗ | = ⌊n/2⌋◮ run the selection algorithm Ŝλ (I ∗ ) on I ∗◮ do these steps many times and compute therelative selection frequenciesˆΠ λ j= P ∗ (j ∈ Ŝλ (I ∗ )), j = 1, . . . , p(P ∗ is w.r.t. sub-sampling)could also use bootstrap sampling with replacement...

subsampling:◮ draw sub-sample of size ⌊n/2⌋ without replacement,denoted by I ∗ ⊆ {1, . . . , n}, |I ∗ | = ⌊n/2⌋◮ run the selection algorithm Ŝλ (I ∗ ) on I ∗◮ do these steps many times and compute therelative selection frequenciesˆΠ λ j= P ∗ (j ∈ Ŝλ (I ∗ )), j = 1, . . . , p(P ∗ is w.r.t. sub-sampling)could also use bootstrap sampling with replacement...

Stability selectionŜ stable = {j; ˆΠ λ j ≥ π thr }choice of π thr ❀ see later

if we consider many regularization parameters:{Ŝλ ; λ ∈ Λ}(Λ can be discrete or continuous)Ŝ stable = {j; max λ∈Λ ˆΠλ j ≥ π thr }see also Bach (2009) for a related proposal

Choice of threshold π thr ∈ (0, 1)?

How to choose the threshold π thr ?denote by V = |S c 0 ∩ Ŝstable | = number of false positivesconsider a selection procedure which selects q variables(e.g. top 50 variables when running Lasso over many λ’s)Theorem (Meinshausen & PB, 2010main assumption: exchangeability conditionin addition: Ŝ is better than “random guessing”Then:E[V ] ≤1 q 22π thr − 1 pi.e. finite sample control, even if p ≫ n❀ choose threshold π thr to control e.g. E[V ] ≤ 1 orP[V > 0] ≤ E[V ] ≤ α

note the generality of the Theorem...◮ it works for any method which is better than “randomguessing”◮ it works not only for regression but also for “any” discretestructure estimation problem (whenever there is ainclude/exclude decision)❀ variable selection, graphical modeling, clustering, ...and hence there must be a fairly strong condition...Exchangeability condition:the distribution of {I {j∈ Ŝ λ } ; j ∈ SC 0} is exchangeablenote: only some requirement for noise variables

note the generality of the Theorem...◮ it works for any method which is better than “randomguessing”◮ it works not only for regression but also for “any” discretestructure estimation problem (whenever there is ainclude/exclude decision)❀ variable selection, graphical modeling, clustering, ...and hence there must be a fairly strong condition...Exchangeability condition:the distribution of {I {j∈ Ŝ λ } ; j ∈ SC 0} is exchangeablenote: only some requirement for noise variables

Many simulations!datapnssnr(A)100010040.5 2100010 20 500.5 2 0.5 2 0.5 2violations 0 2 0 0max cor 0.34 0.11(B)1000200 10004 10 20 500.5 2 0.5 2 0.5 2 0.5 200.6510.5444(C)1000200 10004 10 20 500.5 2 0.5 2 0.5 2 0.5 2140.991081310.99302(D)100 1000200 200 100010 30 4 10 20 500.5 2 0.5 2 0.5 2 0.5 2 0.5 2 0.5 210.7420.691100.692(E)200 1000200 200 100010 30 4 10 20 500.5 2 0.5 2 0.5 2 0.5 2 0.5 2 0.5 2660.73371030.783422590.7762(F)66025874 10 20 500.5 2 0.5 2 0.5 2 0.5 200.48470159(G)40881154 100.5 2 0.5 21180.824771XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXrep(1, 2)rep(1, 2)rep(1, 2)P(first 0.1s correct)rep(1, 2)3/41/21/40Xrep(1, 2)rep(1, 2)Xrep(1, 2)Xrep(1, 2)XXXXXXrep(1, 2)Xrep(1, 2)XXXXrep(1, 2)Xrep(1, 2)XXXXXXXrep(1, 2)Xrep(1, 2)Xrep(1, 2)Xrep(1, 2)XXXXXrep(1, rep(1, 2) rep(1, 2) rep(1, 2) 2)rep(1, 2)rep(1, 2)rep(1, 2)P(first 0.4s correct)rep(1, 2)13/41/2X1/40X X XXX X X Xrep(1, rep(1, 2) 2)rep(1, rep(1, 2) 2)rep(1, rep(1, 2) 2)rep(1, rep(1, 2) 2)rep(1, rep(1, 2) 2)rep(1, rep(1, 2) 2)XXXXXXXXXXrep(1, 2)rep(1, 2)rep(1, 2)rep(1, 2)rep(1, 2)rep(1, 2)XXXX X XX XXXXXXX X X X X X X X X XXXX X X X X X X X X X XX X X X Xrep(1, 2)rep(1, 2)rep(1, 2)rep(1, 2)rep(1, 2)rep(1, 2)even if exchangeability condition fails:(conservative) error control for E[V ] holds

Motif regression (n=287, p=195)◮ two stable motifs with stability selection (w.r.t. E[V ] ≤ 1)◮ Multi sample splitting finds only the one motif which is notbiologically validated

Ribolflavin data (n=71, p=4088): control for E[V ] ≤ 1stability selection E[V]

Graphical modeling using ! = GLasso 0.46 ! = 0.448 ! = 0.436 ! = 0.424!!!!!!!!!!! !!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!! !! !!!(Rothman, Bickel, Levina!& Zhu, 2008; Friedman, Hastie & Tibshirani, 2008)!!!! !!! !!!! !!! !!!! !!! !!!!!! !!!!! !!!!! !!!!!!!! !!!!! !!!!!!!! !!!!! !!!!!!!! !!!!!!! !!!!!!!! !!!!! !!!!! !!!!!infer conditional independence !!!! !!!graph !!!! !!!using l 1 -penalization!!!! !!! ! ! !!!!!!!!!!!!!i.e. infer zeroes of Σ −1 ! !! !! !!!!!!!!!!!!! from X 1 , . . . , X n i.i.d. ∼ N p (0, Σ)!! ! ! !! ! ! !! ! Σ −1!jk≠ 0 ⇔ X (j) ̸⊥ X (k) |X ({1,...,p}\{j,k}) !!!!!!!!!!!!⇔!edge j − ! k!!!!!!!gene expr. dataGraphical LassoStability Selection! ! !!! ! !! !!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!! !! !!!! !!!!!! !!!! ! !!!!! !!!!!!!! !!! ! !!!! !!!!! !!!!! !! ! ! !! ! !!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!! !!! ! ! !! ! !! ! !!! ! ! ! !!!! !!!!!!! !! ! !!! !! ! !!!! !!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!! !!!!!!!!!!!!!!! !!!!!!!!!!! !!!!!!!!!! !!!!!!!!!!!! !!!! ! !!!!! !!!!!!!!!! !! !!!!! !!! ! !!!!!!!! ! !!!!! !!!!!! !!!!!! !!!!!!!! !!!!!! !!!!!!!! !! ! !! ! !!! ! !!!! !!! ! ! !!!! !!! ! ! !!!! !!!!!!!!!!!!!! !! !!!!!!!!!!! !! ! ! !! ! ! !! ! !! !! !! ! !! !! !!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!! !!!!!!!!!! !!!!?!⇒! !!!zero-pattern of Σ −1

sub-problem of Riboflavin datap = 160, n = 115stability selection with E[V ] ≤ 5●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●varying the regularization parameter λ in l 1 -penalizationGraphical Lasso● ●● ●● ●●● ● ●●●λ = 0.46●●●●●●● ●●●●● ● ●●●● ●●●●● ●●● ●●● ● ●●●●●●●●●●●●●●●●●● ●●●●●●●● ●●● ●●●● ●● ●●●● ● ●●● ●●● ●● ●● ●●● ● ●●●λ = 0.448●●●●●●● ●●●●● ● ●●●● ●●●●● ●●● ●●● ● ●●●●●●●●●●●●●●●●●● ●●●●●●●● ●●● ●●●● ●● ●●●● ● ●●● ●●● ●● ●● ●●● ● ●●●λ = 0.436●●●●●●● ●●●●● ● ●●●● ●●●●● ●●● ●●● ● ●●●●●●●●●●●●●●●●●● ●●●●●●●● ●●● ●●●● ●● ●●●● ● ●●● ●●● ●● ●● ●●● ● ●●●λ = 0.424●●●●●●● ●●●●● ● ●●●● ●●●●● ●●● ●●● ● ●●●●●●●●●●●●●●●●●● ●●●●●●●● ●●● ●●●● ●● ●●●● ● ●●● ●●● ●● ●● ●●● ● ●●●λ = 0.412●●●●●●● ●●●●● ● ●●●● ●●●●● ●●● ●●● ● ●●●●●●●●●●●●●●●●●● ●●●●●●●● ●●● ●●●● ●● ●●●● ● ●●● ●●● ●● ●● ●●● ● ●●●●●●●●●● ●●●●● ● ●●●● ●●●●λ = 0.4● ●●● ●●● ● ●●●●●●●●●●●●●●●●●● ●●●●●●●● ●●● ●●●● ●● ●●●● ● ●●● ●●Stability Selection● ●● ●● ●●● ● ●●●●●●●●●● ●●●●● ● ●●●● ●●●●● ●●● ●●● ● ●●●●●●●●●●●●●●●●●● ●●●●●●●● ●●● ●●●● ●● ●●●● ● ●●● ●●● ●● ●● ●●● ● ●●●●●●●●●● ●●●●● ● ●●●● ●●●●● ●●● ●●● ● ●●●●●●●●●●●●●●●●●● ●●●●●●●● ●●● ●●●● ●● ●●●● ● ●●● ●●● ●● ●● ●●● ● ●●●●●●●●●● ●●●●● ● ●●●● ●●●●● ●●● ●●● ● ●●●●●●●●●●●●●●●●●● ●●●●●●●● ●●● ●●●● ●● ●●●● ● ●●● ●●● ●● ●● ●●● ● ●●●●●●●●●● ●●●●● ● ●●●● ●●●●● ●●● ●●● ● ●●●●●●●●●●●●●●●●●● ●●●●●●●● ●●● ●●●● ●● ●●●● ● ●●● ●●● ●● ●● ●●● ● ●●●●●●●●●● ●●●●● ● ●●●● ●●●●● ●●● ●●● ● ●●●●●●●●●●●●●●●●●● ●●●●●●●● ●●● ●●●● ●● ●●●● ● ●●● ●●● ●● ●● ●●● ● ●●●●●●●●●● ●●●●● ● ●●●● ●●●●● ●●● ●●● ● ●●●●●●●●●●●●●●●●●● ●●●●●●●● ●●● ●●●● ●● ●●●● ● ●●● ●●with stability selection: choice of initial λ-tuning parameter doesnot matter much (as proved by our theory)just need to fix the finite-sample control

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●permutation of variablesvarying the regularization parameter for the null-caseλ = 0.065●●●●● ●●●●●● ●●●●● Graphical Lasso● ●●● ●●● ●●●●●●●●●●●●●● ●●●● ●●● ●●●●●●●●●●●●●●●●●●● ●●●● ●●λ = 0.063●●●●● ●●●●●● ●●●●● ● ●●●●●●● ● ● ● ●●●●●●●●●● ●● ●●● ●●● ●●●●●●●●●●●●●● ●●●● ●●● ●●●●●●●●●●●●●●●●●●● ●●●● ●●λ = 0.061●●●●● ●●●●●● ●●●●● ● ●●●●●●● ● ● ● ●●●●●●●●●● ●● ●●● ●●● ●●●●●●●●●●●●●● ●●●● ●●● ●●●●●●●●●●●●●●●●●●● ●●●● ●●λ = 0.059●●●●● ●●●●●● ●●●●● ● ●●●●●●● ● ● ● ●●●●●●●●●● ●● ●●● ●●● ●●●●●●●●●●●●●● ●●●● ●●● ●●●●●●●●●●●●●●●●●●● ●●●● ●●λ = 0.057●●●●● ●●●●●● ●●●●● ● ●●●●●●● ● ● ● ●●●●●●●●●● ●● ●●● ●●● ●●●●●●●●●●●●●● ●●●● ●●● ●●●●●●●●●●●●●●●●●●● ●●●● ●●λ = 0.055●●●●● ●●●●●● ●●●●● ● ●●●●●●● ● ● ● ●●●●●●●●●● ●● ●●● ●●● ●●●●●●●●●●●●●● ●●●● ●●● ●●●●●●●●●●●●●●●●●●● ●●●● ●●● ●●●●●●● ● ● ● ●●●●●●●●●● ●●●●●● ●●●●●● ●●●●● Stability Selection● ●●● ●●● ●●●●●●●●●●●●●● ●●●● ●●● ●●●●●●●●●●●●●●●●●●● ●●●● ●●●●●●● ●●●●●● ●●●●● ● ●●●●●●● ● ● ● ●●●●●●●●●● ●● ●●● ●●● ●●●●●●●●●●●●●● ●●●● ●●● ●●●●●●●●●●●●●●●●●●● ●●●● ●●●●●●● ●●●●●● ●●●●● ● ●●●●●●● ● ● ● ●●●●●●●●●● ●● ●●● ●●● ●●●●●●●●●●●●●● ●●●● ●●● ●●●●●●●●●●●●●●●●●●● ●●●● ●●●●●●● ●●●●●● ●●●●● ● ●●●●●●● ● ● ● ●●●●●●●●●● ●● ●●● ●●● ●●●●●●●●●●●●●● ●●●● ●●● ●●●●●●●●●●●●●●●●●●● ●●●● ●●●●●●● ●●●●●● ●●●●● ● ●●●●●●● ● ● ● ●●●●●●●●●● ●● ●●● ●●● ●●●●●●●●●●●●●● ●●●● ●●● ●●●●●●●●●●●●●●●●●●● ●●●● ●●●●●●● ●●●●●● ●●●●● ● ●●●●●●● ● ● ● ●●●●●●●●●● ●● ●●● ●●● ●●●●●●●●●●●●●● ●●●● ●●● ●●●●●●●●●●●●●●●●●●● ●●●● ●●● ●●●●●●● ● ● ● ●●●●●●●●●● ●with stability selection: the number of false positives is indeedcontrolled (as proved by our theory)and here: exchangeability condition holds

Predicting causal effects from observational data↘ error rateCausal gene rankingsummary median errorGene rank effect expression (PCER) name1 AT2G45660 1 0.60 5.07 0.0017 AGL20 (SOC1)2 AT4G24010 2 0.61 5.69 0.0021 ATCSLG13 AT1G15520 2 0.58 5.42 0.0017 PDR124 AT3G02920 5 0.58 7.44 0.0024 replication protein-related5 AT5G43610 5 0.41 4.98 0.0101 ATSUC66 AT4G00650 7 0.48 5.56 0.0020 FRI7 AT1G24070 8 0.57 6.13 0.0026 ATCSLA108 AT1G19940 9 0.53 5.13 0.0019 AtGH9B59 AT3G61170 9 0.51 5.12 0.0034 protein coding10 AT1G32375 10 0.54 5.21 0.0031 protein coding11 AT2G15320 10 0.50 5.57 0.0027 protein coding12 AT2G28120 10 0.49 6.45 0.0026 protein coding13 AT2G16510 13 0.50 10.7 0.0023 AVAP514 AT3G14630 13 0.48 4.87 0.0039 CYP72A915 AT1G11800 15 0.51 6.97 0.0028 protein coding16 AT5G44800 16 0.32 6.55 0.0704 CHR417 AT3G50660 17 0.40 7.60 0.0059 DWF418 AT5G10140 19 0.30 10.3 0.0064 FLC19 AT1G24110 20 0.49 4.66 0.0059 peroxidase, putative20 AT1G27030 20 0.45 10.1 0.0059 unknown protein• biological validation by gene knockout experiments in progress.❀ see Friday...

The Lasso and its stability path and why Stability Selection works so wellriboflavin example: n = 71, p = 4099sparsity s 0 “=” 6 (6 “relevant” genes;all other variables permuted)LassoStability selectionwith stability selection: the 4-6 “true” variables are sticking outmuch more clearly from noise covariates

stability selection cannot be reproduced by simply selecting theright penalty with Lassostability selection provides a fundamentally new solution

Leo Breimanand providing error controlin terms of E[V ] (❀ conservative FWER control)

providing error control:E[V ] ≤1 q 22π thr − 1 p◮ super-simple!◮ over-simplistic?

Comparative conclusionswithout conditions on the “design matrix” or “covariancestructure” (etc.):cannot assign strengths or uncertainty of a variable or astructural component (e.g. edge in a graph) in ahigh-dimensional settingand these conditions are typically uncheckable...

three “easy to use” methods in comparison:method assumptions applicabilitymulti sample splitting compatibility condition GLMsbeta-min conditionstability selection exchangeability condition “all”the less assumptions, the more trustable the statisticalinference!“yet”, given the necessity of often uncheckable “designconditions”:confirmatory high-dimensional inference remains challenging

P-values and Stability Selection - Seminar fÃ¼r Statistik - ETH ZÃ¼rich

Create successful ePaper yourself

Delete template?

Save as template?