Ranking and selection in high-dimensional inference
Ranking and selection in high-dimensional inference
Ranking and selection in high-dimensional inference
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>Rank<strong>in</strong>g</strong> <strong>and</strong> Selection <strong>in</strong> High<br />
Dimensional Inference<br />
Michael Newton, UW Madison<br />
SSC 2013, Edmonton<br />
Introductory Overview Lecture<br />
In collaboration with Nick Henderson<br />
Copyright M.A. Newton
Classical<br />
<br />
<br />
rank<strong>in</strong>g/<strong>selection</strong><br />
<br />
theory/methods<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
nt downloaded from 144.92.73.152 on Tue, 28 May 2013 14:57:46 PM<br />
All use subject to JSTOR Terms <strong>and</strong> Conditions
All use <br />
subject to JSTOR Terms <strong>and</strong> Conditions<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
This content downloaded from 144.92.73.152 on Tue, 28 May 2013 15:30:30 PM
• small number of units<br />
• methods did not catch on
Some motivat<strong>in</strong>g examples<br />
• cytogenetic abnormalities<br />
• gene expression changes<br />
• genome-wide association studies<br />
• chromosome contact studies<br />
• gene set analysis<br />
• other...
Cytogenetic abnormalities<br />
Newton et al. 1994<br />
842 M. NEWTONS, S-Q. WU AND C. REZNIKOFF<br />
Figure 2. Chromogram: losses <strong>and</strong> potential losses from all 16 l<strong>in</strong>es. Top half records <strong>in</strong>formation on p-arms, <strong>and</strong> the<br />
bottom half q-arms. Chromosomes are arranged horizontally. The total number of arm losses at each chromosome is the<br />
number of shaded boxes. The outer box records the potential number of losses. Horizontal hash<strong>in</strong>g <strong>in</strong>dicates tied loss<br />
copies. The cytogenetic data provide no <strong>in</strong>formation about LOH when only one or two arms are<br />
lost. LOH from arm loss has been documented <strong>in</strong> a number of tetraploid l<strong>in</strong>es, based on allelic
Figure 4. L<strong>in</strong>es show the log Bayes factor of at least one suppressor gene as a function of the prior probability q. The<br />
Bayes factor, equal to the ratio of posterior odds to prior odds of at least one gene, is the probability of the data given at<br />
Cytogenetic abnormalities Newton et al. 1994<br />
848<br />
M. NEWTONS, S-Q. WU AND C. REZNIKOFF<br />
10<br />
18<br />
3<br />
10<br />
5<br />
/ ..............................<br />
.....<br />
.........<br />
............................<br />
........<br />
...................................<br />
LL.<br />
...............<br />
....<br />
...........<br />
.................................... ...........<br />
.........<br />
...............<br />
-5<br />
-10<br />
7<br />
~- ~<br />
r- I I I<br />
0.0 02 0.4 0.6 0.8<br />
Prior Prob&uiy 01<br />
supprua olr*
Genome-wide association studies (GWAS)<br />
odds ratios for SNP effects on disease phenotype<br />
st<strong>and</strong>ard error depends on SNP allele frequencies
Comparative Genomics: Breast Cancer Risk Loci<br />
! Rat QTL analysis identifies Mcs5a risk loci<br />
! Human ortholog reduces BC risk <strong>in</strong> women<br />
! Immune mechanism via E3-ligase, γδ T-cells<br />
Gould<br />
" PNAS, 2007; BC Research 2011; NAR 2012<br />
" R01 CA123272; UWCCC Pilot [Gould/Trentham Dietz, CC]<br />
U01 ES019466
Chromosome Conformation Capture (4C)
Chromosome Conformation Capture (4C)<br />
DNA reads contact<strong>in</strong>g MCS5a<br />
Smits et al.
Chromosome Conformation Capture (4C)<br />
Smits et al.
Chromosome Conformation Capture (4C)<br />
Smits et al.<br />
b<strong>in</strong>s {i}<br />
b<strong>in</strong> lengths {⌘ i }<br />
read counts {X i }<br />
X i ⇠ Poisson(✓ i ⌘ i )<br />
H 0,i : ✓ i = ✓ 0<br />
p-value small
Gene set analysis<br />
genome-wide gene-level data {u g }<br />
functional categories {i}<br />
i is a set of genes with specific biological property<br />
set statistics {X i }<br />
e.g.<br />
X i = 1 n i<br />
X<br />
g2i<br />
u g<br />
set sizes n i<br />
Gene Ontology, KEGG, Reactome, ...
Gene set analysis: the imbalance of power Newton et al. 2007<br />
p<br />
ni (X i µ)/<br />
n i<br />
(X i µ)/<br />
n i
Basic statistical problem<br />
• you have data<br />
{X i }<br />
on many separate<br />
populations, or `<strong>in</strong>ference units’<br />
{i}<br />
• there are unit-specific, real-valued parameters<br />
of <strong>in</strong>terest<br />
{✓ i }<br />
• there are unit-specific nuisance parameters<br />
{⌘ i }<br />
• you must identify the units hav<strong>in</strong>g large values<br />
of the parameter of <strong>in</strong>terest
Basic statistical problem<br />
• Special emphasis on “variance of variances”<br />
• Precision of estimates of unit-specific<br />
parameters may vary widely among units
L<strong>and</strong>mark results, <strong>high</strong> <strong>dimensional</strong> <strong>in</strong>ference<br />
• Neyman <strong>and</strong> Scott (1949). Consistent estimates based<br />
on partially consistent observations. Econometrika 16:<br />
1-32<br />
X 1,i ,X 2,i ⇠ Normal(✓ i ,<br />
2 )<br />
ˆ2n<br />
! 2 /2<br />
MLE breaks down <strong>in</strong> <strong>high</strong> dimensions
L<strong>and</strong>mark results, <strong>high</strong> <strong>dimensional</strong> <strong>in</strong>ference<br />
• Kiefer <strong>and</strong> Wolfowitz (1956). Consistency of the<br />
maximum likelihood estimates <strong>in</strong> the presence of<br />
<strong>in</strong>f<strong>in</strong>itely many nuisance parameters. Annals of<br />
Mathematical Statistics, 27: 887-906<br />
X i ⇠ p(x|✓ i ,⌘)<br />
✓ i ⇠ F<br />
Z<br />
p(x|⌘) =<br />
p(x|✓, ⌘)dF (✓)<br />
ˆF n<br />
! F<br />
ˆ⌘ n<br />
! ⌘<br />
MLE ok if you treat <strong>high</strong>-d param as r<strong>and</strong>om variable
The Ste<strong>in</strong> effect<br />
Experiment produces data to<br />
estimate a s<strong>in</strong>gle parameter<br />
The natural estimator is ˆ = ¯X<br />
M<strong>in</strong>imizes risk: Rˆ( )=E<br />
ˆ<br />
⇥ 2
The Ste<strong>in</strong> effect<br />
Experiment produces data to<br />
estimate two parameters<br />
=( 1 , 2)<br />
The natural estimator is<br />
ˆ =(¯X1 , ¯X 2 )<br />
M<strong>in</strong>imizes risk:<br />
Rˆ( )=E (ˆ1 1) 2 +(ˆ2 2) 2 ⇥
The Ste<strong>in</strong> effect<br />
Experiment produces data to estimate<br />
n > 2 parameters<br />
=( 1 , 2,..., n)<br />
The natural estimator is<br />
ˆ =(¯X1 , ¯X 2 ,..., ¯X n ) ...<br />
Does not m<strong>in</strong>imize risk!
The Ste<strong>in</strong> effect<br />
There are other estimators with smaller risk...<br />
R˜( )
Bayes, Empirical Bayes, <strong>and</strong> their<br />
uneasy relationship!<br />
• Bayes estimates always w<strong>in</strong> on frequencytheory<br />
risk (they’re admissible!)<br />
• EB: pretend you’re do<strong>in</strong>g a Bayesian analysis<br />
us<strong>in</strong>g a common prior prior on all units; work<br />
out the Bayes estimate; plug <strong>in</strong> an estimate of<br />
the hyperparameters<br />
Efron & Morris; Brown; Carl<strong>in</strong> & Louis, et al.
Hypothesis test<strong>in</strong>g
Null hypothesis: H 0 : = 0<br />
p-value = tail area<br />
x<br />
0<br />
Berger & Sellke (1987). Test<strong>in</strong>g a po<strong>in</strong>t null hypothesis: the<br />
irreconcilability of p values <strong>and</strong> evidence. JASA, 82, 112-122.<br />
Experiment produces data x to test<br />
the value a s<strong>in</strong>gle parameter
1/2<br />
Prior distribution for<br />
1/2<br />
0<br />
!"#$"#%&'(% )"**+",%-"./0'$%&% 120'/%34**%5672/8".0.%<br />
-&;*"%9%?@%A2#%B"AA#"6.C-67"% 1#02#%<br />
7% /% 9% D% 9E% FE% DE% 9EE% 9GEEE%<br />
Test<strong>in</strong>g <strong>in</strong> <strong>high</strong> dimensions
Benjam<strong>in</strong>i <strong>and</strong> Hochberg (1995). Controll<strong>in</strong>g the false<br />
discovery rate: a practical <strong>and</strong> powerful approach to<br />
multiple test<strong>in</strong>g. JRSSB 57: 289-300.<br />
0 critical<br />
1<br />
value<br />
p-value
Benjam<strong>in</strong>i <strong>and</strong> Hochberg (1995). Controll<strong>in</strong>g the false<br />
discovery rate: a practical <strong>and</strong> powerful approach to<br />
multiple test<strong>in</strong>g. JRSSB 57: 289-300.<br />
true non-discoveries<br />
0 critical<br />
1<br />
value<br />
p-value
Benjam<strong>in</strong>i <strong>and</strong> Hochberg (1995). Controll<strong>in</strong>g the false<br />
discovery rate: a practical <strong>and</strong> powerful approach to<br />
multiple test<strong>in</strong>g. JRSSB 57: 289-300.<br />
false non-discoveries<br />
true non-discoveries<br />
0 critical<br />
1<br />
value<br />
p-value
Benjam<strong>in</strong>i <strong>and</strong> Hochberg (1995). Controll<strong>in</strong>g the false<br />
discovery rate: a practical <strong>and</strong> powerful approach to<br />
multiple test<strong>in</strong>g. JRSSB 57: 289-300.<br />
true<br />
discoveries<br />
false non-discoveries<br />
true non-discoveries<br />
0 critical<br />
1<br />
value<br />
p-value
Benjam<strong>in</strong>i <strong>and</strong> Hochberg (1995). Controll<strong>in</strong>g the false<br />
discovery rate: a practical <strong>and</strong> powerful approach to<br />
multiple test<strong>in</strong>g. JRSSB 57: 289-300.<br />
true<br />
discoveries<br />
false non-discoveries<br />
false<br />
iscoveries<br />
true non-discoveries<br />
0 critical<br />
1<br />
value<br />
p-value
p−value<br />
0.08<br />
p−values<br />
0.06<br />
0.04<br />
alpha = 0.4<br />
p_(g*)<br />
0.02<br />
0.00<br />
0.00 0.05 g*/G 0.10 0.15<br />
rank/G
The mixture model perspective<br />
Storey, 2003. The positive false discovery rate: a<br />
Bayesian <strong>in</strong>terpretation of the q-value. Annals of<br />
Statistics, 31: 2013-2035.<br />
Null H 0,i ⇠ Bernoulli(⇡ 0 )<br />
FDR(c) =P (H 0,i |p i apple c) = c⇡ 0<br />
F (c) apple<br />
c<br />
F (c)<br />
For FDR= ↵ solve F (x ⇤ )=x ⇤ /↵<br />
Discoveries L = {i : p i apple x ⇤ }
0.15<br />
slope = 1/alpha<br />
EDF<br />
0.10<br />
rank/G<br />
F(x*)<br />
0.05<br />
0.00<br />
0.00 0.02 x* 0.04 0.06 0.08<br />
p−value
p−value<br />
BH<br />
Mixture<br />
0.08<br />
p−values<br />
0.15<br />
slope = 1/alpha<br />
EDF<br />
0.06<br />
0.10<br />
0.04<br />
alpha = 0.4<br />
rank/G<br />
F(x*)<br />
p_(g*)<br />
0.05<br />
0.02<br />
0.00<br />
0.00 0.05 g*/G 0.10 0.15<br />
rank/G<br />
0.00<br />
0.00 0.02 x* 0.04 0.06 0.08<br />
p−value
BH<br />
Mixture<br />
0.15<br />
slope = 1/alpha<br />
EDF<br />
0.10<br />
rank/G<br />
F(x*)<br />
0.05<br />
0.00<br />
0.00 0.02 x* 0.04 0.06 0.08<br />
p−value
ank/G<br />
0.05 g*/G 0.10 0.15<br />
0.00<br />
0.00<br />
0.02<br />
p_(g*)<br />
0.04<br />
0.06<br />
0.08<br />
alpha = 0.4<br />
p−values<br />
BH<br />
Mixture<br />
rank/G<br />
p−value<br />
0.15<br />
slope = 1/alpha<br />
EDF<br />
0.10<br />
F(x*)<br />
0.05<br />
0.00<br />
0.00 0.02 x* 0.04 0.06 0.08<br />
p−value
ank/G<br />
rank/G<br />
0.05 g*/G 0.10 0.15<br />
0.00<br />
0.00<br />
0.02<br />
p_(g*)<br />
0.04<br />
0.06<br />
0.08<br />
alpha = 0.4<br />
p−values<br />
BH<br />
Mixture<br />
p−value<br />
0.15<br />
0.10<br />
F(x*)<br />
0.05<br />
slope = 1/alpha<br />
0.00<br />
0.00 0.02 x* 0.04 0.06 0.08<br />
p−value<br />
Storey, JD <strong>and</strong> Tibshirani, R (2003). Statistical significance<br />
for genomewide studies. PNAS, 100, 9440-9445.<br />
EDF
gene−level DE; n=31099<br />
q(p) =P (H 0 | pval apple p)<br />
Density<br />
0 1 2 3 4 5<br />
pi = 0.37<br />
q−value<br />
0.0 0.1 0.2 0.3 0.4 0.5 0.6<br />
FDR = 0.05<br />
0.0 0.2 0.4 0.6 0.8 1.0<br />
0.0 0.2 0.4 0.6 0.8 1.0<br />
p−value<br />
p−value<br />
Microarray example; 2 similar forms of head & neck cancer
Notes<br />
1. p-value <strong>and</strong> q-value give same order<strong>in</strong>g<br />
2. q is not necessarily greater than p
The mixture model perspective;<br />
why start with p-values?<br />
f(x) =⇡f 0 (x)+(1<br />
⇡)f 1 (x)<br />
1. x is all data on <strong>in</strong>ference unit<br />
f 0 <strong>and</strong> f 1 are modeled<br />
Newton et al 2001; Kendziorski<br />
et al. 2003 (microarrays);<br />
et al.<br />
2. x is univariate test statistic<br />
Efron et al. (local FDR)
Mixture gives e(x) =P (H 0 | x)<br />
L = {units i : e(x i ) apple k}<br />
Duality<br />
E (#errors on L| data) = X i<br />
e i 1[e i apple k]<br />
k<br />
controls conditional FDR
Notes<br />
1. If q-value <strong>and</strong> locFDR are based on the<br />
same test statistic, ma<strong>in</strong> difference is<br />
sided-ness [masked by q, not locFDR]<br />
2. Substantial differences occur ow<strong>in</strong>g to<br />
form of the rank<strong>in</strong>g variable (e.g., the<br />
underly<strong>in</strong>g test statistic)
Everyth<strong>in</strong>g <strong>in</strong> moderation<br />
Smyth (2004). L<strong>in</strong>ear models <strong>and</strong> empirical Bayes methods for<br />
assess<strong>in</strong>g differential expression <strong>in</strong> microarray experiments.<br />
to a first order approximation.<br />
SAGMB, 3, Article 3. [limma] [gene expression]<br />
to be evaluated at ˆ↵ g <strong>and</strong> the dependence is assumed to be such tha<br />
Let v gj be the jth diagonal element of C T V g C. The distributional<br />
<strong>in</strong> this paper about the data can be summarized by<br />
L<strong>in</strong>ear model per gene<br />
ˆgj | gj,<br />
2<br />
g ⇠ N( gj ,v gj<br />
2<br />
g )<br />
<strong>and</strong><br />
s 2 g |<br />
2<br />
g ⇠<br />
where d g is the residual degrees of freedom for the l<strong>in</strong>ear model for g<br />
assumptions the ord<strong>in</strong>ary t-statistic<br />
2<br />
g<br />
d g<br />
2<br />
dg<br />
t gj =<br />
ˆgj<br />
s g<br />
p<br />
vgj<br />
follows an approximate t-distribution on d g degrees of freedom.
3 Hierarchical Model<br />
Everyth<strong>in</strong>g <strong>in</strong> moderation<br />
Given<br />
ast<strong>and</strong>ardconjugatepriorforthenormaldistributionalmodelassumed<strong>in</strong>thepre<br />
the large number of gene-wise l<strong>in</strong>ear model fits aris<strong>in</strong>g from a microa<br />
periment, This section. describes there In the is the case a expected press<strong>in</strong>g of replicated distribution need tos<strong>in</strong>gle take ofsample advantage log-folddata, changes of the thefor model parallel genes <strong>and</strong>which structure priorare here d<br />
theentially reparametrization same model expressed. is fitted of Apart that tofrom proposed eachthe gene. mix<strong>in</strong>g by This Lönnstedt proportion section <strong>and</strong>def<strong>in</strong>es pSpeed j , the a(2002). above simple equations The hierarchica paramet desc<br />
which ast<strong>and</strong>ardconjugatepriorforthenormaldistributionalmodelassumed<strong>in</strong>theprev<br />
tions <strong>in</strong>are e↵ect related describes through this d g parallel = f, v g structure. =1/n, d 0 =2⌫, The skey 2 is to describe how the u<br />
0 = a/(d 0 v g )<strong>and</strong>v 0 = c whe<br />
2<br />
coen, cients ⌫ <strong>and</strong> agjare <strong>and</strong> as unknown <strong>in</strong> Lönnstedt variances <strong>and</strong> Speed g vary (2002). across See also genes. Lönnstedt This is (2001). done byFo<br />
a<br />
Smyth (2004). L<strong>in</strong>ear models <strong>and</strong> empirical Bayes methods for<br />
assess<strong>in</strong>g differential where D() is theexpression Dirichlet function. <strong>in</strong> microarray experiments.<br />
SAGMB, 3, prior Article<br />
calculations The distributions null jo<strong>in</strong>t<br />
3. [limma]<br />
this for distribution paper these[gene setsof above of ˜t <strong>and</strong> parameters. expression]<br />
prior s 2 is details are su<br />
Prior <strong>in</strong>formation is assumed on g 2 equivalent to a prior estimator s 2 0 with d 0<br />
of freedom, n, ⌫Under <strong>and</strong> ai.e.,<br />
the areabove as <strong>in</strong> Lönnstedt hierarchical<br />
p(˜t, s 2 |<br />
<strong>and</strong> model,<br />
=0)=˜sv<br />
Speed the(2002). posterior 1/2 p( ˆ,s See 2 mean<br />
|<br />
also<br />
=0) 2<br />
Lönnstedt of g given (2001). s 2 g is ˜s For g<br />
2<br />
which calculations after collection <strong>in</strong> this paper of factors the above yields 1prior details<br />
⇠ 1 are su cient <strong>and</strong> it is not neces<br />
2<br />
to fully specify a multivariate prior for2<br />
the<br />
g<br />
0 s 2 d 0<br />
.<br />
˜s 2 g.<br />
g = d 0s 2 0 + d g s 2 g<br />
0 .<br />
2<br />
Under the above hierarchical model, the posterior mean of g given s 2 g is ˜s g<br />
2<br />
0 + d<br />
p(˜t, s 2 (d 0 s 2<br />
| =0) =<br />
0) d0/2 d d/2 2(d/2 1)<br />
g s<br />
B(d/2,d 0 /2)(d 0 s 2 0 + ds 2 ) d 0/2+d/2<br />
Conjugate prior<br />
Then p j is the expected proportion of truly di↵erentially expressed genes. For t<br />
2<br />
2<br />
which are nonzero, prior <strong>in</strong>formation gj | g, on gj 6= the0⇠ coeN(0,v cient 0j is g ). assumed equivalent to a p<br />
observation equal to zero with unscaled variance v 0j , i.e.,<br />
This describes the expected distribution of log-fold changes for genes which are d<br />
entially expressed. Apart from 2<br />
2<br />
gj | the mix<strong>in</strong>g<br />
g, gj 6= 0⇠ proportion N(0,v 0j<br />
p j g ). , the above equations des<br />
section. In the case of replicated s<strong>in</strong>gle sample data, the model <strong>and</strong> prior here<br />
reparametrization of that proposed by Lönnstedt <strong>and</strong> Speed cient (2002). <strong>and</strong> it The is not paramet nece<br />
tions to fully arespecify relatedathrough multivariate d g = f, prior v g =1/n, for the d 0 g. =2⌫, s 2 0 = a/(d 0 v g )<strong>and</strong>v 0 = c whe<br />
The posterior values shr<strong>in</strong>k the observed<br />
˜s 2 g = d 0svariances 2 0 + d g s 2 towards the prior values with<br />
g<br />
degree of shr<strong>in</strong>kage depend<strong>in</strong>g on the(d relative 0 + d) 7 .<br />
d 0 + 1/2 sizes d of the observed <strong>and</strong> prior degre<br />
⇥<br />
g 1+ ˜t 2 ! (1+d0 +d)/2<br />
freedom. Def<strong>in</strong>e the moderated t-statistic B(1/2,d 0 by /2+d/2) d 0 + d<br />
The posterior values shr<strong>in</strong>k the observed variances towards the prior values with<br />
This degree shows of shr<strong>in</strong>kage that ˜t <strong>and</strong> depend<strong>in</strong>g s 2 are <strong>in</strong>dependent on the relative with sizes of the observed <strong>and</strong> prior degre<br />
˜t gj =<br />
ˆgj<br />
p .<br />
freedom. Def<strong>in</strong>e the moderated t-statistic by ˜s g vgj<br />
s 2 ⇠ s 2 0F d,d0<br />
Shr<strong>in</strong>kage estimate<br />
Moderated test statistic<br />
<strong>and</strong><br />
This statistic represents a hybrid classical/Bayes<br />
˜t gj =<br />
ˆgj approach <strong>in</strong> which the posterior<br />
p .<br />
ance has been substituted <strong>in</strong>to to the classical ˜s g vgj t-statistic <strong>in</strong> place of the usual sa<br />
˜t | =0⇠ t d0 +d.<br />
variance. The moderated t reduces to the ord<strong>in</strong>ary t-statistic if d 0 =0<strong>and</strong>a<br />
The This opposite above statistic end derivation of represents the spectrum goes a hybrid through is proportion classical/Bayes similarlytowith<br />
theapproach coe6= 0,theonlydi↵erence cient <strong>in</strong> ˆgj which if d 0 = posterior 1. be<strong>in</strong><br />
anceInhas thebeen nextsubstituted section the<strong>in</strong>to moderated to the classical t-statistics t-statistic ˜t gj <strong>and</strong> <strong>in</strong> residual place of sample the usual varianc sam<br />
Modified null (predictive)<br />
˜ 1/2
Don’t summarize too soon<br />
Leek, JT <strong>and</strong> Storey JD (2007). Captur<strong>in</strong>g heterogeneity <strong>in</strong><br />
gene expression studies by surrogate variable analysis. PLoS<br />
Genetics 3(9): e161.<br />
effects of unmeasured<br />
<strong>and</strong> unmodeled factors
Beyond hypothesis test<strong>in</strong>g<br />
• empirical Bayes procedures<br />
• procedures designed aga<strong>in</strong>st global criteria
Empirical Bayes<br />
“local” posterior (unit i)<br />
p (✓ i |x i ,⌘ i ) / f(✓) p(x i |✓ i ,⌘ i )<br />
p(x i |✓ i ,⌘ i )<br />
• simple form,<br />
• known nuisance params<br />
f,F<br />
• estimated us<strong>in</strong>g all units
Estimat<strong>in</strong>g the “mix<strong>in</strong>g” or “prior” distribution<br />
Nonparametric MLE<br />
L<strong>in</strong>dsay (1995).<br />
ˆF = argmax F<br />
Y<br />
i<br />
Z<br />
p(x i |✓, ⌘ i ) dF (✓)<br />
Note:<br />
ˆF (A) = mean i P (✓ i 2 A|x i ,⌘ i )<br />
Or other options...
Gene set enrichment (<strong>in</strong>fluenza RNAi example)<br />
n i = size of set i<br />
✓ i ⇠ F<br />
x i ⇠ B<strong>in</strong>omial(n i ,✓ i )
Gene set enrichment<br />
ˆF<br />
0/260<br />
p(✓ i |x i ,⌘ i )<br />
density<br />
1/25<br />
23/729<br />
9/149<br />
0.00 0.0092 0.02 0.04 0.06 0.08<br />
θ
Posterior means:<br />
ˆ✓i = E (✓ i | data)<br />
1. “local” Bayes estimate under squared error loss<br />
2. trades off bias <strong>and</strong> variance<br />
3. widely used to prioritize units by non-testers!
Posterior expected ranks<br />
Laird <strong>and</strong> Louis (1989). Empirical Bayes rank<strong>in</strong>g methods. JES,<br />
14, 29-46.<br />
rank(✓ i )= X j<br />
1(✓ i apple ✓ j )<br />
rank(✓ ˆ<br />
X i )=<br />
j<br />
P (✓ i apple ✓ j | data)<br />
from the top
Low posterior quantiles<br />
P<br />
⇣✓ i apple ˆ✓ i x i<br />
⌘<br />
=0.05
Low posterior quantiles<br />
P<br />
⇣✓ i apple ˆ✓ i x i<br />
⌘<br />
=0.05<br />
density<br />
0.00 0.02 0.04 0.06 0.08<br />
θ
Every list<strong>in</strong>g of top-rank<strong>in</strong>g sets has a<br />
conditional FDR statement:<br />
E<br />
P<br />
i 1[ˆ✓ i<br />
P<br />
k]1[✓ i
Every list<strong>in</strong>g of top-rank<strong>in</strong>g sets has a<br />
conditional FDR statement:<br />
E<br />
P<br />
i 1[ˆ✓ i<br />
P<br />
k]1[✓ i
0.5<br />
4 replicate flu<br />
studies agree<br />
much more at<br />
set level than<br />
gene level<br />
agreement among studies<br />
0.4<br />
0.3<br />
0.2<br />
0.1<br />
0.0<br />
0 50 100 150 200 250<br />
number of top sets<br />
low 5% −ile<br />
posterior mean<br />
y/n<br />
z<br />
gene list
Other approaches:<br />
L<strong>in</strong>, R, Louis, TA, Paddock, SM, <strong>and</strong> Ridgeway, G (2006). Loss<br />
function based rank<strong>in</strong>g <strong>in</strong> two-stage hierarchical models.<br />
Bayesian Analysis 1, 915-946.<br />
• Bayes estimates under various loss functions<br />
• squared error loss; classification loss; distancebased<br />
loss; weighted loss...
Options<br />
• MLE<br />
• p-value (q-value) [aga<strong>in</strong>st no-effect H0]<br />
• Pr(H0|data)<br />
[parametric/NP/SP/locFDR]<br />
• moderated t (p-value)<br />
• SVA (p-value)<br />
• local posterior mean<br />
• local posterior quantile<br />
• other Bayesian: loss-based<br />
• other test<strong>in</strong>g: p-values aga<strong>in</strong>st other H0<br />
• other estimates: ...
T = P n<br />
i=1 [X i >25] 16 20.90 (0.65) 8.03 (0.48)<br />
T = P n<br />
i=1 [X i >40] 12 13.10 (0.66) 1.10 (0.22)<br />
T = P n<br />
i=1 [X i >60] 11 11.17 (0.61) 0.13 (0.06)<br />
The method matters<br />
Table 2: Flu Data: summary <strong>in</strong>formation for several chosen gene sets. Here, m i represents the number<br />
of genes with<strong>in</strong> a given GO term, <strong>and</strong> X i represents the number of genes belong<strong>in</strong>g to both the given<br />
GO term <strong>and</strong> the 614 genes of <strong>in</strong>terest. The rank<strong>in</strong>gs accord<strong>in</strong>g to several di↵erent methods are also<br />
provided. For example, the category with GO id GO:0022627 was ranked number one by both the<br />
posterior mean <strong>and</strong> posterior rank method<br />
ID GO Term X i m i p.mean p.rank p.val mle<br />
GO:0022627 cytosolic small ribosomal subunit 18 36 1 1 3 251<br />
GO:0004672 prote<strong>in</strong> k<strong>in</strong>ase activity 76 594 119 65 2 1074<br />
GO:0030126 COPI vesicle coat 7 11 3 3 52 159<br />
GO:0030529 ribonucleoprote<strong>in</strong> complex 69 504 101 54 1 1040<br />
GO:0000059 prote<strong>in</strong> import <strong>in</strong>to nucleus, dock<strong>in</strong>g 2 2 62 271 178 1<br />
GO:0005852 eukaryotic translation <strong>in</strong>itiation factor ... 8 14 4 2 40 164<br />
GO:0048200 Golgi transport vesicle coat<strong>in</strong>g 6 10 7 8 75 161<br />
GO:0000912 assembly of actomyos<strong>in</strong> apparatus... 1 1 325 770 411 2<br />
GO:0048205 COPI coat<strong>in</strong>g of Golgi vesicle 6 10 8 9 76 162<br />
GO:0033179 proton-transport<strong>in</strong>g V-type ATPase,... 5 6 5 10 69 147
A framework for compar<strong>in</strong>g methods<br />
<strong>and</strong> generat<strong>in</strong>g better ones
Decouple local parameters with<br />
variance-stabiliz<strong>in</strong>g transformation<br />
X i ⇠ Normal ✓ i ,<br />
2<br />
i X i is the local MLE<br />
✓ i ? 2 i <strong>in</strong> the system<br />
✓ i ⇠ F<br />
2<br />
i ⇠ G<br />
all known, for now
Threshold<strong>in</strong>g functions <strong>and</strong> rank<strong>in</strong>g variables<br />
“Precision-guided” <strong>selection</strong> rules<br />
{t ↵ ( 2 )}<br />
unit i on top ↵ list L ↵ if X i t ↵ ( 2 i )<br />
P X t ↵ ( 2 ) = ↵ marg<strong>in</strong>al<br />
<strong>Rank<strong>in</strong>g</strong> variables:<br />
R i<br />
R constant for all X, 2 s.t. X = t ↵ ( 2 )<br />
monotone <strong>in</strong> ↵
p-value rule<br />
P−value family of threshold functions<br />
6<br />
H 0,i : ✓ i =0<br />
R i =1 pval i = (X i / i )<br />
X<br />
4<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●●<br />
●<br />
●<br />
●<br />
●<br />
List sizes<br />
size 0.01<br />
size 0.05<br />
size 0.1<br />
2<br />
0<br />
●<br />
●<br />
● ●<br />
●<br />
●<br />
●<br />
●<br />
● ●<br />
●<br />
●<br />
● ●<br />
●<br />
●<br />
● ●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
● ● ●<br />
●●<br />
● ●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
● ●<br />
●<br />
●<br />
● ●<br />
●<br />
●<br />
●<br />
● ●<br />
●<br />
●<br />
●<br />
● ● ● ●<br />
●<br />
●<br />
● ● ●<br />
● ●<br />
t ↵ ( 2 )=c ↵ 2 ⇠ Gamma(a, b)<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
0 1 2 3 4 5<br />
σ 2
Posterior mean rule<br />
P−value family of threshold functions<br />
Posterior mean family of threshold functions<br />
6<br />
6<br />
F = Normal(0, 1)<br />
4<br />
R<br />
●<br />
i = E(✓<br />
●●<br />
●<br />
●<br />
● i |X●<br />
i )=<br />
X<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
X i<br />
2<br />
i +1<br />
List sizes<br />
size 0.01<br />
size 0.05<br />
size 0.1<br />
X<br />
4<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●●<br />
●<br />
●<br />
●<br />
●<br />
List sizes<br />
size 0.01<br />
size 0.05<br />
size 0.1<br />
2<br />
0<br />
●<br />
●<br />
●<br />
●<br />
●<br />
● ●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
● ●<br />
●●<br />
● ●<br />
●<br />
●<br />
●<br />
●<br />
● ●<br />
●<br />
● ●<br />
●<br />
●<br />
● ●<br />
●<br />
● ● ● ●<br />
●<br />
● ● ●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
● ●<br />
●<br />
●<br />
●<br />
●<br />
t ↵ ( 2 )=c ↵ 2 +1<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
2<br />
0<br />
●<br />
●<br />
● ●<br />
●<br />
●<br />
●<br />
●<br />
● ●<br />
●<br />
●<br />
● ●<br />
●<br />
●<br />
● ●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
● ● ●<br />
●●<br />
● ●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
● ●<br />
●<br />
●<br />
● ●<br />
●<br />
●<br />
●<br />
● ●<br />
●<br />
●<br />
●<br />
● ● ● ●<br />
●<br />
●<br />
● ● ●<br />
● ●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
0 1 2 3 4 5<br />
σ 2<br />
0 1 2 3 4 5<br />
σ 2<br />
2 ⇠ Gamma(a, b)
Limma rule<br />
R i =<br />
X i<br />
p<br />
w<br />
2<br />
i +(1 w)<br />
x<br />
0 1 2 3 4 5 6<br />
0 1 2 3 4 5<br />
σ 2<br />
2 ⇠ Exp(1)
Threshold Curves for a list size of 0.05.<br />
4<br />
3<br />
t(σ 2 )<br />
2<br />
Method<br />
mle<br />
p−value<br />
post mean<br />
post rank<br />
1<br />
0<br />
0.0 0.5 1.0 1.5<br />
σ 2<br />
2 ⇠ Gamma(a, b)<br />
E( 2 )=CV ( 2 )=1
Performance Criteria<br />
Notes:<br />
we know ✓ ↵ such that: P (✓ i ✓ ↵ )=↵<br />
each rank<strong>in</strong>g procedure corresponds to rules: X i t ↵ ( 2 i )
Performance Criteria<br />
Accuracy:<br />
1 X<br />
n ↵<br />
i<br />
1 X i t ↵ ( 2 i ) 1(✓ i ✓ ↵ )
Performance Criteria<br />
Content:<br />
1 X<br />
n ↵<br />
i<br />
1 X i t ↵ ( 2 i ) ✓ i
Performance Criteria<br />
Stability:<br />
1 X<br />
n ↵<br />
i<br />
1 X i t ↵ ( 2 i ) 1 X ⇤ i t ↵ ( 2 i )
Performance Criteria: take limit on # units<br />
Accuracy:<br />
1 X<br />
n ↵<br />
i<br />
1 X i t ↵ ( 2 i ) 1(✓ i ✓ ↵ )<br />
! P X t ↵ ( 2 ) ✓ ✓ ↵<br />
<strong>in</strong>tegrates<br />
✓, , <strong>and</strong> X
Performance Criteria: limit<strong>in</strong>g # units<br />
Accuracy:<br />
P X t ↵ ( 2 )|✓ ✓ ↵<br />
P ✓ ✓ ↵ |X t ↵ ( 2 )<br />
Content:<br />
Stability:<br />
E ✓| X t ↵ ( 2 )<br />
P X ⇤ t ↵ ( 2 )|X t ↵ ( 2 )
content<br />
Mean Content with E(σ 2 )=15<br />
Mean Content with E(σ 2 )=5<br />
Mean Content with E(σ 2 )=1<br />
Mean Content with E(σ 2 )=1/2<br />
MLE<br />
MLE<br />
MLE<br />
MLE<br />
pval<br />
pval<br />
pval<br />
pval<br />
mean content<br />
0.0 0.5 1.0 1.5 2.0<br />
post.mean<br />
mean content<br />
0.0 0.5 1.0 1.5 2.0<br />
post.mean<br />
mean content<br />
0.0 0.5 1.0 1.5 2.0<br />
post.mean<br />
mean content<br />
0.0 0.5 1.0 1.5 2.0<br />
post.mean<br />
0.5 1.0 1.5 2.0<br />
0.5 1.0 1.5 2.0<br />
0.5 1.0 1.5 2.0<br />
0.5 1.0 1.5 2.0<br />
Coef. of Variation<br />
Coef. of Variation<br />
Coef. of Variation<br />
Coef. of Variation
Optimal thresholds<br />
Approach:<br />
• work <strong>in</strong> limit<strong>in</strong>g case, normal model<br />
• assume smooth threshold functions<br />
• assume smooth density for variances<br />
• use calculus of variations<br />
• f<strong>in</strong>d thresholds maximiz<strong>in</strong>g criteria
Optimal thresholds<br />
Maximize Stability: P X ⇤ t ↵ ( 2 )|X t ↵ ( 2 )<br />
High stability with steep thresholds;<br />
tries to select units with smallest<br />
variation regardless of data<br />
No solution
Optimal thresholds<br />
Maximize Content: E ✓| X t ↵ ( 2 )<br />
Solution:<br />
t ↵ ( 2 )=c ↵ 2 +1<br />
R i = E(✓ i |X i )=<br />
X i<br />
2<br />
i +1<br />
i.e. rank<strong>in</strong>g by local Bayes estimate has global<br />
optimality property
Optimal thresholds<br />
Maximize Accuracy:<br />
P X t ↵ ( 2 )|✓ ✓ ↵<br />
Solution:<br />
t ↵ ( 2 )=✓ ↵ ( 2 + 1) u ↵<br />
p<br />
2<br />
(1 + 2 )
Thresholds for most accurate <strong>and</strong> <strong>high</strong>est content<br />
x<br />
0 1 2 3 4 5 6<br />
0 1 2 3 4 5<br />
σ 2<br />
x<br />
0 1 2 3 4 5 6<br />
0 1 2 3 4 5<br />
σ 2<br />
2 ⇠ Exp(1)
Optimal thresholds<br />
Maximize Accuracy:<br />
P X t ↵ ( 2 )|✓ ✓ ↵<br />
Solution:<br />
t ↵ ( 2 )=✓ ↵ ( 2 + 1) u ↵<br />
p<br />
2<br />
(1 + 2 )<br />
Let<br />
U ↵ (X,<br />
2 )=P ✓ ✓ ↵ |X,<br />
2<br />
R i =m<strong>in</strong> ⇥ ↵ : P<br />
U ↵ (X,<br />
2 ) U ↵ (X i ,<br />
2<br />
i )|X i ,<br />
2<br />
i<br />
↵ ⇤<br />
R i = ↵ if i has position ↵ with<strong>in</strong> P (✓ j ✓ ↵ |X j ,<br />
2<br />
j )<br />
smaller better
St<strong>and</strong>ard Model: list-conditional variance distributions.<br />
Quantiles of list−conditional variances<br />
Quantiles of list−conditional variances<br />
5<br />
5<br />
0<br />
0<br />
log(quantile)<br />
−5<br />
Method<br />
p.mean<br />
p.value<br />
log(quantile)<br />
−5<br />
Method<br />
mle<br />
true<br />
−10<br />
−10<br />
−15<br />
−15<br />
0.0 0.5 1.0 1.5 2.0<br />
Coef. of Variation<br />
0.0 0.5 1.0 1.5 2.0<br />
Coef. of Variation<br />
E( 2 )=5
Thresholds for most accurate <strong>and</strong> <strong>high</strong>est content<br />
x<br />
0 1 2 3 4 5 6<br />
0 1 2 3 4 5<br />
σ 2<br />
x<br />
0 1 2 3 4 5 6<br />
0 1 2 3 4 5<br />
σ 2<br />
2 ⇠ Exp(1)
Some observations<br />
• rank<strong>in</strong>g by p-value is stable, but puts too<br />
many small-variance units on top list<br />
• rank<strong>in</strong>g by MLE puts too many <strong>high</strong>variance<br />
units on the top list<br />
• rank<strong>in</strong>g by posterior mean is a good<br />
tradeoff, <strong>and</strong> it maximizes list content<br />
• more accurate top-list EB methods are<br />
available
Performance Criteria<br />
“Aggregated” versions:<br />
1 X<br />
n ↵<br />
i<br />
1 X<br />
n ↵<br />
i<br />
1 X i t ↵ ( 2 i ) ✓ i 1(✓ i ✓ ↵ )<br />
1 X i t ↵ ( 2 i ) 1 X ⇤ i t ↵ ( 2 i ) 1(✓ i ✓ ↵ )
Optimal thresholds for aggregated criteria<br />
Solutions exhibits variance screen<strong>in</strong>g<br />
Figure 7: The threshold curve which maximizes the “aggregate content”. This procedure discards all<br />
units whose variances exceed around 0.14.. Here, the list size is .025, <strong>and</strong> the st<strong>and</strong>ard model parameters<br />
are E( 2 ) = 5 <strong>and</strong> CV( 2 )=1<br />
Curve for list size=0.025, with CV(σ 2 ) = 1 <strong>and</strong> E(σ 2 )=5<br />
t(σ 2 )<br />
0.0 0.5 1.0 1.5 2.0 2.5<br />
optimal aggregate<br />
post mean<br />
0.00 0.05 0.10 0.15<br />
σ 2