Ranking and selection in high-dimensional inference

Ranking and Selection in High 

Dimensional Inference 

Michael Newton, UW Madison 

SSC 2013, Edmonton 

Introductory Overview Lecture 

In collaboration with Nick Henderson 

Copyright M.A. Newton

Classical 

 

 

ranking/selection 

 

theory/methods 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

nt downloaded from 144.92.73.152 on Tue, 28 May 2013 14:57:46 PM 

All use subject to JSTOR Terms and Conditions

All use 

subject to JSTOR Terms and Conditions 

 

 

 

 

 

 

 

 

 

This content downloaded from 144.92.73.152 on Tue, 28 May 2013 15:30:30 PM

• small number of units 

• methods did not catch on

Some motivating examples 

• cytogenetic abnormalities 

• gene expression changes 

• genome-wide association studies 

• chromosome contact studies 

• gene set analysis 

• other...

Cytogenetic abnormalities 

Newton et al. 1994 

842 M. NEWTONS, S-Q. WU AND C. REZNIKOFF 

Figure 2. Chromogram: losses and potential losses from all 16 lines. Top half records information on p-arms, and the 

bottom half q-arms. Chromosomes are arranged horizontally. The total number of arm losses at each chromosome is the 

number of shaded boxes. The outer box records the potential number of losses. Horizontal hashing indicates tied loss 

copies. The cytogenetic data provide no information about LOH when only one or two arms are 

lost. LOH from arm loss has been documented in a number of tetraploid lines, based on allelic

Figure 4. Lines show the log Bayes factor of at least one suppressor gene as a function of the prior probability q. The 

Bayes factor, equal to the ratio of posterior odds to prior odds of at least one gene, is the probability of the data given at 

Cytogenetic abnormalities Newton et al. 1994 

848 

M. NEWTONS, S-Q. WU AND C. REZNIKOFF 

10 

18 

3 

10 

5 

/ .............................. 

..... 

......... 

............................ 

........ 

................................... 

LL. 

............... 

.... 

........... 

.................................... ........... 

......... 

............... 

-5 

-10 

7 

~- ~ 

r- I I I 

0.0 02 0.4 0.6 0.8 

Prior Prob&uiy 01 

supprua olr*

Genome-wide association studies (GWAS) 

odds ratios for SNP effects on disease phenotype 

standard error depends on SNP allele frequencies

Comparative Genomics: Breast Cancer Risk Loci 

! Rat QTL analysis identifies Mcs5a risk loci 

! Human ortholog reduces BC risk in women 

! Immune mechanism via E3-ligase, γδ T-cells 

Gould 

" PNAS, 2007; BC Research 2011; NAR 2012 

" R01 CA123272; UWCCC Pilot [Gould/Trentham Dietz, CC] 

U01 ES019466

Chromosome Conformation Capture (4C)

Chromosome Conformation Capture (4C) 

DNA reads contacting MCS5a 

Smits et al.


Smits et al.


Smits et al. 

bins {i} 

bin lengths {⌘ i } 

read counts {X i } 

X i ⇠ Poisson(✓ i ⌘ i ) 

H 0,i : ✓ i = ✓ 0 

p-value small

Gene set analysis 

genome-wide gene-level data {u g } 

functional categories {i} 

i is a set of genes with specific biological property 

set statistics {X i } 

e.g. 

X i = 1 n i 

X 

g2i 

u g 

set sizes n i 

Gene Ontology, KEGG, Reactome, ...

Gene set analysis: the imbalance of power Newton et al. 2007 

p 

ni (X i µ)/ 

n i 

(X i µ)/ 

n i

Basic statistical problem 

• you have data 

{X i } 

on many separate 

populations, or `inference units’ 

{i} 

• there are unit-specific, real-valued parameters 

of interest 

{✓ i } 

• there are unit-specific nuisance parameters 

{⌘ i } 

• you must identify the units having large values 

of the parameter of interest

Basic statistical problem 

• Special emphasis on “variance of variances” 

• Precision of estimates of unit-specific 

parameters may vary widely among units

Landmark results, high dimensional inference 

• Neyman and Scott (1949). Consistent estimates based 

on partially consistent observations. Econometrika 16: 

1-32 

X 1,i ,X 2,i ⇠ Normal(✓ i , 

2 ) 

ˆ2n 

! 2 /2 

MLE breaks down in high dimensions

Landmark results, high dimensional inference 

• Kiefer and Wolfowitz (1956). Consistency of the 

maximum likelihood estimates in the presence of 

infinitely many nuisance parameters. Annals of 

Mathematical Statistics, 27: 887-906 

X i ⇠ p(x|✓ i ,⌘) 

✓ i ⇠ F 

Z 

p(x|⌘) = 

p(x|✓, ⌘)dF (✓) 

ˆF n 

! F 

ˆ⌘ n 

! ⌘ 

MLE ok if you treat high-d param as random variable

The Stein effect 

Experiment produces data to 

estimate a single parameter 

The natural estimator is ˆ = ¯X 

Minimizes risk: Rˆ( )=E 

ˆ 

⇥ 2


Experiment produces data to 

estimate two parameters 

=( 1 , 2) 

The natural estimator is 

ˆ =(¯X1 , ¯X 2 ) 

Minimizes risk: 

Rˆ( )=E (ˆ1 1) 2 +(ˆ2 2) 2 ⇥


Experiment produces data to estimate 

n > 2 parameters 

=( 1 , 2,..., n) 

The natural estimator is 

ˆ =(¯X1 , ¯X 2 ,..., ¯X n ) ... 

Does not minimize risk!


There are other estimators with smaller risk... 

R˜( )

Bayes, Empirical Bayes, and their 

uneasy relationship! 

• Bayes estimates always win on frequencytheory 

risk (they’re admissible!) 

• EB: pretend you’re doing a Bayesian analysis 

using a common prior prior on all units; work 

out the Bayes estimate; plug in an estimate of 

the hyperparameters 

Efron & Morris; Brown; Carlin & Louis, et al.

Hypothesis testing

Null hypothesis: H 0 : = 0 

p-value = tail area 

x 

0 

Berger & Sellke (1987). Testing a point null hypothesis: the 

irreconcilability of p values and evidence. JASA, 82, 112-122. 

Experiment produces data x to test 

the value a single parameter

1/2 

Prior distribution for 

1/2 

0 

!"#$"#%&'(% )"**+",%-"./0'$%&% 120'/%34**%5672/8".0.% 

-&;*"%9%?@%A2#%B"AA#"6.C-67"% 1#02#% 

7% /% 9% D% 9E% FE% DE% 9EE% 9GEEE% 

Testing in high dimensions

Benjamini and Hochberg (1995). Controlling the false 

discovery rate: a practical and powerful approach to 

multiple testing. JRSSB 57: 289-300. 

0 critical 

1 

value 

p-value




true non-discoveries 

0 critical 

1 

value 

p-value




false non-discoveries 


0 critical 

1 

value 

p-value




true 

discoveries 



0 critical 

1 

value 

p-value




true 

discoveries 


false 

iscoveries 


0 critical 

1 

value 

p-value

p−value 

0.08 

p−values 

0.06 

0.04 

alpha = 0.4 

p_(g*) 

0.02 

0.00 

0.00 0.05 g*/G 0.10 0.15 

rank/G

The mixture model perspective 

Storey, 2003. The positive false discovery rate: a 

Bayesian interpretation of the q-value. Annals of 

Statistics, 31: 2013-2035. 

Null H 0,i ⇠ Bernoulli(⇡ 0 ) 

FDR(c) =P (H 0,i |p i apple c) = c⇡ 0 

F (c) apple 

c 

F (c) 

For FDR= ↵ solve F (x ⇤ )=x ⇤ /↵ 

Discoveries L = {i : p i apple x ⇤ }

0.15 

slope = 1/alpha 

EDF 

0.10 

rank/G 

F(x*) 

0.05 

0.00 

0.00 0.02 x* 0.04 0.06 0.08 

p−value

p−value 

BH 

Mixture 

0.08 

p−values 

0.15 


EDF 

0.06 

0.10 

0.04 

alpha = 0.4 

rank/G 

F(x*) 

p_(g*) 

0.05 

0.02 

0.00 

0.00 0.05 g*/G 0.10 0.15 

rank/G 

0.00 

0.00 0.02 x* 0.04 0.06 0.08 

p−value

BH 

Mixture 

0.15 


EDF 

0.10 

rank/G 

F(x*) 

0.05 

0.00 

0.00 0.02 x* 0.04 0.06 0.08 

p−value

ank/G 

0.05 g*/G 0.10 0.15 

0.00 

0.00 

0.02 

p_(g*) 

0.04 

0.06 

0.08 

alpha = 0.4 

p−values 

BH 

Mixture 

rank/G 

p−value 

0.15 


EDF 

0.10 

F(x*) 

0.05 

0.00 

0.00 0.02 x* 0.04 0.06 0.08 

p−value

ank/G 

rank/G 

0.05 g*/G 0.10 0.15 

0.00 

0.00 

0.02 

p_(g*) 

0.04 

0.06 

0.08 

alpha = 0.4 

p−values 

BH 

Mixture 

p−value 

0.15 

0.10 

F(x*) 

0.05 


0.00 

0.00 0.02 x* 0.04 0.06 0.08 

p−value 

Storey, JD and Tibshirani, R (2003). Statistical significance 

for genomewide studies. PNAS, 100, 9440-9445. 

EDF

gene−level DE; n=31099 

q(p) =P (H 0 | pval apple p) 

Density 

0 1 2 3 4 5 

pi = 0.37 

q−value 

0.0 0.1 0.2 0.3 0.4 0.5 0.6 

FDR = 0.05 

0.0 0.2 0.4 0.6 0.8 1.0 

0.0 0.2 0.4 0.6 0.8 1.0 

p−value 

p−value 

Microarray example; 2 similar forms of head & neck cancer

Notes 

1. p-value and q-value give same ordering 

2. q is not necessarily greater than p

The mixture model perspective; 

why start with p-values? 

f(x) =⇡f 0 (x)+(1 

⇡)f 1 (x) 

1. x is all data on inference unit 

f 0 and f 1 are modeled 

Newton et al 2001; Kendziorski 

et al. 2003 (microarrays); 

et al. 

2. x is univariate test statistic 

Efron et al. (local FDR)

Mixture gives e(x) =P (H 0 | x) 

L = {units i : e(x i ) apple k} 

Duality 

E (#errors on L| data) = X i 

e i 1[e i apple k] 

k 

controls conditional FDR

Notes 

1. If q-value and locFDR are based on the 

same test statistic, main difference is 

sided-ness [masked by q, not locFDR] 

2. Substantial differences occur owing to 

form of the ranking variable (e.g., the 

underlying test statistic)

Everything in moderation 

Smyth (2004). Linear models and empirical Bayes methods for 

assessing differential expression in microarray experiments. 

to a first order approximation. 

SAGMB, 3, Article 3. [limma] [gene expression] 

to be evaluated at ˆ↵ g and the dependence is assumed to be such tha 

Let v gj be the jth diagonal element of C T V g C. The distributional 

in this paper about the data can be summarized by 

Linear model per gene 

ˆgj | gj, 

2 

g ⇠ N( gj ,v gj 

2 

g ) 

and 

s 2 g | 

2 

g ⇠ 

where d g is the residual degrees of freedom for the linear model for g 

assumptions the ordinary t-statistic 

2 

g 

d g 

2 

dg 

t gj = 

ˆgj 

s g 

p 

vgj 

follows an approximate t-distribution on d g degrees of freedom.

3 Hierarchical Model 

Everything in moderation 

Given 

astandardconjugatepriorforthenormaldistributionalmodelassumedinthepre 

the large number of gene-wise linear model fits arising from a microa 

periment, This section. describes there In the is the case a expected pressing of replicated distribution need tosingle take ofsample advantage log-folddata, changes of the thefor model parallel genes andwhich structure priorare here d 

theentially reparametrization same model expressed. is fitted of Apart that tofrom proposed eachthe gene. mixing by This Lönnstedt proportion section anddefines pSpeed j , the a(2002). above simple equations The hierarchica paramet desc 

which astandardconjugatepriorforthenormaldistributionalmodelassumedintheprev 

tions inare e↵ect related describes through this d g parallel = f, v g structure. =1/n, d 0 =2⌫, The skey 2 is to describe how the u 

0 = a/(d 0 v g )andv 0 = c whe 

2 

coen, cients ⌫ and agjare and as unknown in Lönnstedt variances and Speed g vary (2002). across See also genes. Lönnstedt This is (2001). done byFo 

a 

Smyth (2004). Linear models and empirical Bayes methods for 

assessing differential where D() is theexpression Dirichlet function. in microarray experiments. 

SAGMB, 3, prior Article 

calculations The distributions null joint 

3. [limma] 

this for distribution paper these[gene setsof above of ˜t and parameters. expression] 

prior s 2 is details are su 

Prior information is assumed on g 2 equivalent to a prior estimator s 2 0 with d 0 

of freedom, n, ⌫Under and ai.e., 

the areabove as in Lönnstedt hierarchical 

p(˜t, s 2 | 

and model, 

=0)=˜sv 

Speed the(2002). posterior 1/2 p( ˆ,s See 2 mean 

| 

also 

=0) 2 

Lönnstedt of g given (2001). s 2 g is ˜s For g 

2 

which calculations after collection in this paper of factors the above yields 1prior details 

⇠ 1 are su cient and it is not neces 

2 

to fully specify a multivariate prior for2 

the 

g 

0 s 2 d 0 

. 

˜s 2 g. 

g = d 0s 2 0 + d g s 2 g 

0 . 

2 

Under the above hierarchical model, the posterior mean of g given s 2 g is ˜s g 

2 

0 + d 

p(˜t, s 2 (d 0 s 2 

| =0) = 

0) d0/2 d d/2 2(d/2 1) 

g s 

B(d/2,d 0 /2)(d 0 s 2 0 + ds 2 ) d 0/2+d/2 

Conjugate prior 

Then p j is the expected proportion of truly di↵erentially expressed genes. For t 

2 

2 

which are nonzero, prior information gj | g, on gj 6= the0⇠ coeN(0,v cient 0j is g ). assumed equivalent to a p 

observation equal to zero with unscaled variance v 0j , i.e., 

This describes the expected distribution of log-fold changes for genes which are d 

entially expressed. Apart from 2 

2 

gj | the mixing 

g, gj 6= 0⇠ proportion N(0,v 0j 

p j g ). , the above equations des 

section. In the case of replicated single sample data, the model and prior here 

reparametrization of that proposed by Lönnstedt and Speed cient (2002). and it The is not paramet nece 

tions to fully arespecify relatedathrough multivariate d g = f, prior v g =1/n, for the d 0 g. =2⌫, s 2 0 = a/(d 0 v g )andv 0 = c whe 

The posterior values shrink the observed 

˜s 2 g = d 0svariances 2 0 + d g s 2 towards the prior values with 

g 

degree of shrinkage depending on the(d relative 0 + d) 7 . 

d 0 + 1/2 sizes d of the observed and prior degre 

⇥ 

g 1+ ˜t 2 ! (1+d0 +d)/2 

freedom. Define the moderated t-statistic B(1/2,d 0 by /2+d/2) d 0 + d 

The posterior values shrink the observed variances towards the prior values with 

This degree shows of shrinkage that ˜t and depending s 2 are independent on the relative with sizes of the observed and prior degre 

˜t gj = 

ˆgj 

p . 

freedom. Define the moderated t-statistic by ˜s g vgj 

s 2 ⇠ s 2 0F d,d0 

Shrinkage estimate 

Moderated test statistic 

and 

This statistic represents a hybrid classical/Bayes 

˜t gj = 

ˆgj approach in which the posterior 

p . 

ance has been substituted into to the classical ˜s g vgj t-statistic in place of the usual sa 

˜t | =0⇠ t d0 +d. 

variance. The moderated t reduces to the ordinary t-statistic if d 0 =0anda 

The This opposite above statistic end derivation of represents the spectrum goes a hybrid through is proportion classical/Bayes similarlytowith 

theapproach coe6= 0,theonlydi↵erence cient in ˆgj which if d 0 = posterior 1. bein 

anceInhas thebeen nextsubstituted section theinto moderated to the classical t-statistics t-statistic ˜t gj and in residual place of sample the usual varianc sam 

Modified null (predictive) 

˜ 1/2

Don’t summarize too soon 

Leek, JT and Storey JD (2007). Capturing heterogeneity in 

gene expression studies by surrogate variable analysis. PLoS 

Genetics 3(9): e161. 

effects of unmeasured 

and unmodeled factors

Beyond hypothesis testing 

• empirical Bayes procedures 

• procedures designed against global criteria

Empirical Bayes 

“local” posterior (unit i) 

p (✓ i |x i ,⌘ i ) / f(✓) p(x i |✓ i ,⌘ i ) 

p(x i |✓ i ,⌘ i ) 

• simple form, 

• known nuisance params 

f,F 

• estimated using all units

Estimating the “mixing” or “prior” distribution 

Nonparametric MLE 

Lindsay (1995). 

ˆF = argmax F 

Y 

i 

Z 

p(x i |✓, ⌘ i ) dF (✓) 

Note: 

ˆF (A) = mean i P (✓ i 2 A|x i ,⌘ i ) 

Or other options...

Gene set enrichment (influenza RNAi example) 

n i = size of set i 

✓ i ⇠ F 

x i ⇠ Binomial(n i ,✓ i )

Gene set enrichment 

ˆF 

0/260 

p(✓ i |x i ,⌘ i ) 

density 

1/25 

23/729 

9/149 

0.00 0.0092 0.02 0.04 0.06 0.08 

θ

Posterior means: 

ˆ✓i = E (✓ i | data) 

1. “local” Bayes estimate under squared error loss 

2. trades off bias and variance 

3. widely used to prioritize units by non-testers!

Posterior expected ranks 

Laird and Louis (1989). Empirical Bayes ranking methods. JES, 

14, 29-46. 

rank(✓ i )= X j 

1(✓ i apple ✓ j ) 

rank(✓ ˆ 

X i )= 

j 

P (✓ i apple ✓ j | data) 

from the top

Low posterior quantiles 

P 

⇣✓ i apple ˆ✓ i x i 

⌘ 

=0.05

Low posterior quantiles 

P 

⇣✓ i apple ˆ✓ i x i 

⌘ 

=0.05 

density 

0.00 0.02 0.04 0.06 0.08 

θ

Every listing of top-ranking sets has a 

conditional FDR statement: 

E 

P 

i 1[ˆ✓ i 

P 

k]1[✓ i

Every listing of top-ranking sets has a 

conditional FDR statement: 

E 

P 

i 1[ˆ✓ i 

P 

k]1[✓ i

0.5 

4 replicate flu 

studies agree 

much more at 

set level than 

gene level 

agreement among studies 

0.4 

0.3 

0.2 

0.1 

0.0 

0 50 100 150 200 250 

number of top sets 

low 5% −ile 

posterior mean 

y/n 

z 

gene list

Other approaches: 

Lin, R, Louis, TA, Paddock, SM, and Ridgeway, G (2006). Loss 

function based ranking in two-stage hierarchical models. 

Bayesian Analysis 1, 915-946. 

• Bayes estimates under various loss functions 

• squared error loss; classification loss; distancebased 

loss; weighted loss...

Options 

• MLE 

• p-value (q-value) [against no-effect H0] 

• Pr(H0|data) 

[parametric/NP/SP/locFDR] 

• moderated t (p-value) 

• SVA (p-value) 

• local posterior mean 

• local posterior quantile 

• other Bayesian: loss-based 

• other testing: p-values against other H0 

• other estimates: ...

T = P n 

i=1 [X i >25] 16 20.90 (0.65) 8.03 (0.48) 

T = P n 

i=1 [X i >40] 12 13.10 (0.66) 1.10 (0.22) 

T = P n 

i=1 [X i >60] 11 11.17 (0.61) 0.13 (0.06) 

The method matters 

Table 2: Flu Data: summary information for several chosen gene sets. Here, m i represents the number 

of genes within a given GO term, and X i represents the number of genes belonging to both the given 

GO term and the 614 genes of interest. The rankings according to several di↵erent methods are also 

provided. For example, the category with GO id GO:0022627 was ranked number one by both the 

posterior mean and posterior rank method 

ID GO Term X i m i p.mean p.rank p.val mle 

GO:0022627 cytosolic small ribosomal subunit 18 36 1 1 3 251 

GO:0004672 protein kinase activity 76 594 119 65 2 1074 

GO:0030126 COPI vesicle coat 7 11 3 3 52 159 

GO:0030529 ribonucleoprotein complex 69 504 101 54 1 1040 

GO:0000059 protein import into nucleus, docking 2 2 62 271 178 1 

GO:0005852 eukaryotic translation initiation factor ... 8 14 4 2 40 164 

GO:0048200 Golgi transport vesicle coating 6 10 7 8 75 161 

GO:0000912 assembly of actomyosin apparatus... 1 1 325 770 411 2 

GO:0048205 COPI coating of Golgi vesicle 6 10 8 9 76 162 

GO:0033179 proton-transporting V-type ATPase,... 5 6 5 10 69 147

A framework for comparing methods 

and generating better ones

Decouple local parameters with 

variance-stabilizing transformation 

X i ⇠ Normal ✓ i , 

2 

i X i is the local MLE 

✓ i ? 2 i in the system 

✓ i ⇠ F 

2 

i ⇠ G 

all known, for now

Thresholding functions and ranking variables 

“Precision-guided” selection rules 

{t ↵ ( 2 )} 

unit i on top ↵ list L ↵ if X i t ↵ ( 2 i ) 

P X t ↵ ( 2 ) = ↵ marginal 

Ranking variables: 

R i 

R constant for all X, 2 s.t. X = t ↵ ( 2 ) 

monotone in ↵

p-value rule 

P−value family of threshold functions 

6 

H 0,i : ✓ i =0 

R i =1 pval i = (X i / i ) 

X 

4 

● 

● 

● 

● 

● 

● 

● 

● 

●● 

● 

● 

● 

● 

List sizes 

size 0.01 

size 0.05 

size 0.1 

2 

0 

● 

● 

● ● 

● 

● 

● 

● 

● ● 

● 

● 

● ● 

● 

● 

● ● 

● 

● 

● 

● 

● 

● ● ● 

●● 

● ● 

● 

● 

● 

● 

● 

● ● 

● 

● 

● ● 

● 

● 

● 

● ● 

● 

● 

● 

● ● ● ● 

● 

● 

● ● ● 

● ● 

t ↵ ( 2 )=c ↵ 2 ⇠ Gamma(a, b) 

● 

● 

● 

● 

● 

● 

● 

● 

● 

● 

● 

● 

● 

● 

● 

● 

0 1 2 3 4 5 

σ 2

Posterior mean rule 

P−value family of threshold functions 

Posterior mean family of threshold functions 

6 

6 

F = Normal(0, 1) 

4 

R 

● 

i = E(✓ 

●● 

● 

● 

● i |X● 

i )= 

X 

● 

● 

● 

● 

● 

● 

● 

X i 

2 

i +1 

List sizes 

size 0.01 

size 0.05 

size 0.1 

X 

4 

● 

● 

● 

● 

● 

● 

● 

● 

●● 

● 

● 

● 

● 

List sizes 

size 0.01 

size 0.05 

size 0.1 

2 

0 

● 

● 

● 

● 

● 

● ● 

● 

● 

● 

● 

● 

● 

● 

● ● 

●● 

● ● 

● 

● 

● 

● 

● ● 

● 

● ● 

● 

● 

● ● 

● 

● ● ● ● 

● 

● ● ● 

● 

● 

● 

● 

● 

● 

● 

● 

● 

● 

● 

● 

● 

● 

● 

● 

● ● 

● 

● 

● 

● 

t ↵ ( 2 )=c ↵ 2 +1 

● 

● 

● 

● 

● 

● 

● 

● 

● 

● 

● 

● 

2 

0 

● 

● 

● ● 

● 

● 

● 

● 

● ● 

● 

● 

● ● 

● 

● 

● ● 

● 

● 

● 

● 

● 

● ● ● 

●● 

● ● 

● 

● 

● 

● 

● 

● ● 

● 

● 

● ● 

● 

● 

● 

● ● 

● 

● 

● 

● ● ● ● 

● 

● 

● ● ● 

● ● 

● 

● 

● 

● 

● 

● 

● 

● 

● 

● 

● 

● 

● 

● 

● 

● 

0 1 2 3 4 5 

σ 2 

0 1 2 3 4 5 

σ 2 

2 ⇠ Gamma(a, b)

Limma rule 

R i = 

X i 

p 

w 

2 

i +(1 w) 

x 

0 1 2 3 4 5 6 

0 1 2 3 4 5 

σ 2 

2 ⇠ Exp(1)

Threshold Curves for a list size of 0.05. 

4 

3 

t(σ 2 ) 

2 

Method 

mle 

p−value 

post mean 

post rank 

1 

0 

0.0 0.5 1.0 1.5 

σ 2 

2 ⇠ Gamma(a, b) 

E( 2 )=CV ( 2 )=1

Performance Criteria 

Notes: 

we know ✓ ↵ such that: P (✓ i ✓ ↵ )=↵ 

each ranking procedure corresponds to rules: X i t ↵ ( 2 i )


Accuracy: 

1 X 

n ↵ 

i 

1 X i t ↵ ( 2 i ) 1(✓ i ✓ ↵ )


Content: 

1 X 

n ↵ 

i 

1 X i t ↵ ( 2 i ) ✓ i


Stability: 

1 X 

n ↵ 

i 

1 X i t ↵ ( 2 i ) 1 X ⇤ i t ↵ ( 2 i )

Performance Criteria: take limit on # units 

Accuracy: 

1 X 

n ↵ 

i 

1 X i t ↵ ( 2 i ) 1(✓ i ✓ ↵ ) 

! P X t ↵ ( 2 ) ✓ ✓ ↵ 

integrates 

✓, , and X

Performance Criteria: limiting # units 

Accuracy: 

P X t ↵ ( 2 )|✓ ✓ ↵ 

P ✓ ✓ ↵ |X t ↵ ( 2 ) 

Content: 

Stability: 

E ✓| X t ↵ ( 2 ) 

P X ⇤ t ↵ ( 2 )|X t ↵ ( 2 )

content 

Mean Content with E(σ 2 )=15 



Mean Content with E(σ 2 )=1/2 

MLE 

MLE 

MLE 

MLE 

pval 

pval 

pval 

pval 

mean content 

0.0 0.5 1.0 1.5 2.0 

post.mean 

mean content 

0.0 0.5 1.0 1.5 2.0 

post.mean 

mean content 

0.0 0.5 1.0 1.5 2.0 

post.mean 

mean content 

0.0 0.5 1.0 1.5 2.0 

post.mean 

0.5 1.0 1.5 2.0 

0.5 1.0 1.5 2.0 

0.5 1.0 1.5 2.0 

0.5 1.0 1.5 2.0 

Coef. of Variation 



Coef. of Variation

Optimal thresholds 

Approach: 

• work in limiting case, normal model 

• assume smooth threshold functions 

• assume smooth density for variances 

• use calculus of variations 

• find thresholds maximizing criteria


Maximize Stability: P X ⇤ t ↵ ( 2 )|X t ↵ ( 2 ) 

High stability with steep thresholds; 

tries to select units with smallest 

variation regardless of data 

No solution


Maximize Content: E ✓| X t ↵ ( 2 ) 

Solution: 

t ↵ ( 2 )=c ↵ 2 +1 

R i = E(✓ i |X i )= 

X i 

2 

i +1 

i.e. ranking by local Bayes estimate has global 

optimality property


Maximize Accuracy: 

P X t ↵ ( 2 )|✓ ✓ ↵ 

Solution: 

t ↵ ( 2 )=✓ ↵ ( 2 + 1) u ↵ 

p 

2 

(1 + 2 )

Thresholds for most accurate and highest content 

x 

0 1 2 3 4 5 6 

0 1 2 3 4 5 

σ 2 

x 

0 1 2 3 4 5 6 

0 1 2 3 4 5 

σ 2 

2 ⇠ Exp(1)


Maximize Accuracy: 

P X t ↵ ( 2 )|✓ ✓ ↵ 

Solution: 

t ↵ ( 2 )=✓ ↵ ( 2 + 1) u ↵ 

p 

2 

(1 + 2 ) 

Let 

U ↵ (X, 

2 )=P ✓ ✓ ↵ |X, 

2 

R i =min ⇥ ↵ : P 

U ↵ (X, 

2 ) U ↵ (X i , 

2 

i )|X i , 

2 

i 

↵ ⇤ 

R i = ↵ if i has position ↵ within P (✓ j ✓ ↵ |X j , 

2 

j ) 

smaller better

Standard Model: list-conditional variance distributions. 

Quantiles of list−conditional variances 

Quantiles of list−conditional variances 

5 

5 

0 

0 

log(quantile) 

−5 

Method 

p.mean 

p.value 

log(quantile) 

−5 

Method 

mle 

true 

−10 

−10 

−15 

−15 

0.0 0.5 1.0 1.5 2.0 


0.0 0.5 1.0 1.5 2.0 


E( 2 )=5

Thresholds for most accurate and highest content 

x 

0 1 2 3 4 5 6 

0 1 2 3 4 5 

σ 2 

x 

0 1 2 3 4 5 6 

0 1 2 3 4 5 

σ 2 

2 ⇠ Exp(1)

Some observations 

• ranking by p-value is stable, but puts too 

many small-variance units on top list 

• ranking by MLE puts too many highvariance 

units on the top list 

• ranking by posterior mean is a good 

tradeoff, and it maximizes list content 

• more accurate top-list EB methods are 

available


“Aggregated” versions: 

1 X 

n ↵ 

i 

1 X 

n ↵ 

i 

1 X i t ↵ ( 2 i ) ✓ i 1(✓ i ✓ ↵ ) 

1 X i t ↵ ( 2 i ) 1 X ⇤ i t ↵ ( 2 i ) 1(✓ i ✓ ↵ )

Optimal thresholds for aggregated criteria 

Solutions exhibits variance screening 

Figure 7: The threshold curve which maximizes the “aggregate content”. This procedure discards all 

units whose variances exceed around 0.14.. Here, the list size is .025, and the standard model parameters 

are E( 2 ) = 5 and CV( 2 )=1 

Curve for list size=0.025, with CV(σ 2 ) = 1 and E(σ 2 )=5 

t(σ 2 ) 

0.0 0.5 1.0 1.5 2.0 2.5 

optimal aggregate 

post mean 

0.00 0.05 0.10 0.15 

σ 2

Ranking and selection in high-dimensional inference

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?