29.07.2014 Views

Ranking and selection in high-dimensional inference

Ranking and selection in high-dimensional inference

Ranking and selection in high-dimensional inference

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Rank<strong>in</strong>g</strong> <strong>and</strong> Selection <strong>in</strong> High<br />

Dimensional Inference<br />

Michael Newton, UW Madison<br />

SSC 2013, Edmonton<br />

Introductory Overview Lecture<br />

In collaboration with Nick Henderson<br />

Copyright M.A. Newton


Classical<br />

<br />

<br />

rank<strong>in</strong>g/<strong>selection</strong><br />

<br />

theory/methods<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

nt downloaded from 144.92.73.152 on Tue, 28 May 2013 14:57:46 PM<br />

All use subject to JSTOR Terms <strong>and</strong> Conditions


All use <br />

subject to JSTOR Terms <strong>and</strong> Conditions<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

This content downloaded from 144.92.73.152 on Tue, 28 May 2013 15:30:30 PM


• small number of units<br />

• methods did not catch on


Some motivat<strong>in</strong>g examples<br />

• cytogenetic abnormalities<br />

• gene expression changes<br />

• genome-wide association studies<br />

• chromosome contact studies<br />

• gene set analysis<br />

• other...


Cytogenetic abnormalities<br />

Newton et al. 1994<br />

842 M. NEWTONS, S-Q. WU AND C. REZNIKOFF<br />

Figure 2. Chromogram: losses <strong>and</strong> potential losses from all 16 l<strong>in</strong>es. Top half records <strong>in</strong>formation on p-arms, <strong>and</strong> the<br />

bottom half q-arms. Chromosomes are arranged horizontally. The total number of arm losses at each chromosome is the<br />

number of shaded boxes. The outer box records the potential number of losses. Horizontal hash<strong>in</strong>g <strong>in</strong>dicates tied loss<br />

copies. The cytogenetic data provide no <strong>in</strong>formation about LOH when only one or two arms are<br />

lost. LOH from arm loss has been documented <strong>in</strong> a number of tetraploid l<strong>in</strong>es, based on allelic


Figure 4. L<strong>in</strong>es show the log Bayes factor of at least one suppressor gene as a function of the prior probability q. The<br />

Bayes factor, equal to the ratio of posterior odds to prior odds of at least one gene, is the probability of the data given at<br />

Cytogenetic abnormalities Newton et al. 1994<br />

848<br />

M. NEWTONS, S-Q. WU AND C. REZNIKOFF<br />

10<br />

18<br />

3<br />

10<br />

5<br />

/ ..............................<br />

.....<br />

.........<br />

............................<br />

........<br />

...................................<br />

LL.<br />

...............<br />

....<br />

...........<br />

.................................... ...........<br />

.........<br />

...............<br />

-5<br />

-10<br />

7<br />

~- ~<br />

r- I I I<br />

0.0 02 0.4 0.6 0.8<br />

Prior Prob&uiy 01<br />

supprua olr*


Genome-wide association studies (GWAS)<br />

odds ratios for SNP effects on disease phenotype<br />

st<strong>and</strong>ard error depends on SNP allele frequencies


Comparative Genomics: Breast Cancer Risk Loci<br />

! Rat QTL analysis identifies Mcs5a risk loci<br />

! Human ortholog reduces BC risk <strong>in</strong> women<br />

! Immune mechanism via E3-ligase, γδ T-cells<br />

Gould<br />

" PNAS, 2007; BC Research 2011; NAR 2012<br />

" R01 CA123272; UWCCC Pilot [Gould/Trentham Dietz, CC]<br />

U01 ES019466


Chromosome Conformation Capture (4C)


Chromosome Conformation Capture (4C)<br />

DNA reads contact<strong>in</strong>g MCS5a<br />

Smits et al.


Chromosome Conformation Capture (4C)<br />

Smits et al.


Chromosome Conformation Capture (4C)<br />

Smits et al.<br />

b<strong>in</strong>s {i}<br />

b<strong>in</strong> lengths {⌘ i }<br />

read counts {X i }<br />

X i ⇠ Poisson(✓ i ⌘ i )<br />

H 0,i : ✓ i = ✓ 0<br />

p-value small


Gene set analysis<br />

genome-wide gene-level data {u g }<br />

functional categories {i}<br />

i is a set of genes with specific biological property<br />

set statistics {X i }<br />

e.g.<br />

X i = 1 n i<br />

X<br />

g2i<br />

u g<br />

set sizes n i<br />

Gene Ontology, KEGG, Reactome, ...


Gene set analysis: the imbalance of power Newton et al. 2007<br />

p<br />

ni (X i µ)/<br />

n i<br />

(X i µ)/<br />

n i


Basic statistical problem<br />

• you have data<br />

{X i }<br />

on many separate<br />

populations, or `<strong>in</strong>ference units’<br />

{i}<br />

• there are unit-specific, real-valued parameters<br />

of <strong>in</strong>terest<br />

{✓ i }<br />

• there are unit-specific nuisance parameters<br />

{⌘ i }<br />

• you must identify the units hav<strong>in</strong>g large values<br />

of the parameter of <strong>in</strong>terest


Basic statistical problem<br />

• Special emphasis on “variance of variances”<br />

• Precision of estimates of unit-specific<br />

parameters may vary widely among units


L<strong>and</strong>mark results, <strong>high</strong> <strong>dimensional</strong> <strong>in</strong>ference<br />

• Neyman <strong>and</strong> Scott (1949). Consistent estimates based<br />

on partially consistent observations. Econometrika 16:<br />

1-32<br />

X 1,i ,X 2,i ⇠ Normal(✓ i ,<br />

2 )<br />

ˆ2n<br />

! 2 /2<br />

MLE breaks down <strong>in</strong> <strong>high</strong> dimensions


L<strong>and</strong>mark results, <strong>high</strong> <strong>dimensional</strong> <strong>in</strong>ference<br />

• Kiefer <strong>and</strong> Wolfowitz (1956). Consistency of the<br />

maximum likelihood estimates <strong>in</strong> the presence of<br />

<strong>in</strong>f<strong>in</strong>itely many nuisance parameters. Annals of<br />

Mathematical Statistics, 27: 887-906<br />

X i ⇠ p(x|✓ i ,⌘)<br />

✓ i ⇠ F<br />

Z<br />

p(x|⌘) =<br />

p(x|✓, ⌘)dF (✓)<br />

ˆF n<br />

! F<br />

ˆ⌘ n<br />

! ⌘<br />

MLE ok if you treat <strong>high</strong>-d param as r<strong>and</strong>om variable


The Ste<strong>in</strong> effect<br />

Experiment produces data to<br />

estimate a s<strong>in</strong>gle parameter<br />

The natural estimator is ˆ = ¯X<br />

M<strong>in</strong>imizes risk: Rˆ( )=E<br />

ˆ<br />

⇥ 2


The Ste<strong>in</strong> effect<br />

Experiment produces data to<br />

estimate two parameters<br />

=( 1 , 2)<br />

The natural estimator is<br />

ˆ =(¯X1 , ¯X 2 )<br />

M<strong>in</strong>imizes risk:<br />

Rˆ( )=E (ˆ1 1) 2 +(ˆ2 2) 2 ⇥


The Ste<strong>in</strong> effect<br />

Experiment produces data to estimate<br />

n > 2 parameters<br />

=( 1 , 2,..., n)<br />

The natural estimator is<br />

ˆ =(¯X1 , ¯X 2 ,..., ¯X n ) ...<br />

Does not m<strong>in</strong>imize risk!


The Ste<strong>in</strong> effect<br />

There are other estimators with smaller risk...<br />

R˜( )


Bayes, Empirical Bayes, <strong>and</strong> their<br />

uneasy relationship!<br />

• Bayes estimates always w<strong>in</strong> on frequencytheory<br />

risk (they’re admissible!)<br />

• EB: pretend you’re do<strong>in</strong>g a Bayesian analysis<br />

us<strong>in</strong>g a common prior prior on all units; work<br />

out the Bayes estimate; plug <strong>in</strong> an estimate of<br />

the hyperparameters<br />

Efron & Morris; Brown; Carl<strong>in</strong> & Louis, et al.


Hypothesis test<strong>in</strong>g


Null hypothesis: H 0 : = 0<br />

p-value = tail area<br />

x<br />

0<br />

Berger & Sellke (1987). Test<strong>in</strong>g a po<strong>in</strong>t null hypothesis: the<br />

irreconcilability of p values <strong>and</strong> evidence. JASA, 82, 112-122.<br />

Experiment produces data x to test<br />

the value a s<strong>in</strong>gle parameter


1/2<br />

Prior distribution for<br />

1/2<br />

0<br />

!"#$"#%&'(% )"**+",%-"./0'$%&% 120'/%34**%5672/8".0.%<br />

-&;*"%9%?@%A2#%B"AA#"6.C-67"% 1#02#%<br />

7% /% 9% D% 9E% FE% DE% 9EE% 9GEEE%<br />


Test<strong>in</strong>g <strong>in</strong> <strong>high</strong> dimensions


Benjam<strong>in</strong>i <strong>and</strong> Hochberg (1995). Controll<strong>in</strong>g the false<br />

discovery rate: a practical <strong>and</strong> powerful approach to<br />

multiple test<strong>in</strong>g. JRSSB 57: 289-300.<br />

0 critical<br />

1<br />

value<br />

p-value


Benjam<strong>in</strong>i <strong>and</strong> Hochberg (1995). Controll<strong>in</strong>g the false<br />

discovery rate: a practical <strong>and</strong> powerful approach to<br />

multiple test<strong>in</strong>g. JRSSB 57: 289-300.<br />

true non-discoveries<br />

0 critical<br />

1<br />

value<br />

p-value


Benjam<strong>in</strong>i <strong>and</strong> Hochberg (1995). Controll<strong>in</strong>g the false<br />

discovery rate: a practical <strong>and</strong> powerful approach to<br />

multiple test<strong>in</strong>g. JRSSB 57: 289-300.<br />

false non-discoveries<br />

true non-discoveries<br />

0 critical<br />

1<br />

value<br />

p-value


Benjam<strong>in</strong>i <strong>and</strong> Hochberg (1995). Controll<strong>in</strong>g the false<br />

discovery rate: a practical <strong>and</strong> powerful approach to<br />

multiple test<strong>in</strong>g. JRSSB 57: 289-300.<br />

true<br />

discoveries<br />

false non-discoveries<br />

true non-discoveries<br />

0 critical<br />

1<br />

value<br />

p-value


Benjam<strong>in</strong>i <strong>and</strong> Hochberg (1995). Controll<strong>in</strong>g the false<br />

discovery rate: a practical <strong>and</strong> powerful approach to<br />

multiple test<strong>in</strong>g. JRSSB 57: 289-300.<br />

true<br />

discoveries<br />

false non-discoveries<br />

false<br />

iscoveries<br />

true non-discoveries<br />

0 critical<br />

1<br />

value<br />

p-value


p−value<br />

0.08<br />

p−values<br />

0.06<br />

0.04<br />

alpha = 0.4<br />

p_(g*)<br />

0.02<br />

0.00<br />

0.00 0.05 g*/G 0.10 0.15<br />

rank/G


The mixture model perspective<br />

Storey, 2003. The positive false discovery rate: a<br />

Bayesian <strong>in</strong>terpretation of the q-value. Annals of<br />

Statistics, 31: 2013-2035.<br />

Null H 0,i ⇠ Bernoulli(⇡ 0 )<br />

FDR(c) =P (H 0,i |p i apple c) = c⇡ 0<br />

F (c) apple<br />

c<br />

F (c)<br />

For FDR= ↵ solve F (x ⇤ )=x ⇤ /↵<br />

Discoveries L = {i : p i apple x ⇤ }


0.15<br />

slope = 1/alpha<br />

EDF<br />

0.10<br />

rank/G<br />

F(x*)<br />

0.05<br />

0.00<br />

0.00 0.02 x* 0.04 0.06 0.08<br />

p−value


p−value<br />

BH<br />

Mixture<br />

0.08<br />

p−values<br />

0.15<br />

slope = 1/alpha<br />

EDF<br />

0.06<br />

0.10<br />

0.04<br />

alpha = 0.4<br />

rank/G<br />

F(x*)<br />

p_(g*)<br />

0.05<br />

0.02<br />

0.00<br />

0.00 0.05 g*/G 0.10 0.15<br />

rank/G<br />

0.00<br />

0.00 0.02 x* 0.04 0.06 0.08<br />

p−value


BH<br />

Mixture<br />

0.15<br />

slope = 1/alpha<br />

EDF<br />

0.10<br />

rank/G<br />

F(x*)<br />

0.05<br />

0.00<br />

0.00 0.02 x* 0.04 0.06 0.08<br />

p−value


ank/G<br />

0.05 g*/G 0.10 0.15<br />

0.00<br />

0.00<br />

0.02<br />

p_(g*)<br />

0.04<br />

0.06<br />

0.08<br />

alpha = 0.4<br />

p−values<br />

BH<br />

Mixture<br />

rank/G<br />

p−value<br />

0.15<br />

slope = 1/alpha<br />

EDF<br />

0.10<br />

F(x*)<br />

0.05<br />

0.00<br />

0.00 0.02 x* 0.04 0.06 0.08<br />

p−value


ank/G<br />

rank/G<br />

0.05 g*/G 0.10 0.15<br />

0.00<br />

0.00<br />

0.02<br />

p_(g*)<br />

0.04<br />

0.06<br />

0.08<br />

alpha = 0.4<br />

p−values<br />

BH<br />

Mixture<br />

p−value<br />

0.15<br />

0.10<br />

F(x*)<br />

0.05<br />

slope = 1/alpha<br />

0.00<br />

0.00 0.02 x* 0.04 0.06 0.08<br />

p−value<br />

Storey, JD <strong>and</strong> Tibshirani, R (2003). Statistical significance<br />

for genomewide studies. PNAS, 100, 9440-9445.<br />

EDF


gene−level DE; n=31099<br />

q(p) =P (H 0 | pval apple p)<br />

Density<br />

0 1 2 3 4 5<br />

pi = 0.37<br />

q−value<br />

0.0 0.1 0.2 0.3 0.4 0.5 0.6<br />

FDR = 0.05<br />

0.0 0.2 0.4 0.6 0.8 1.0<br />

0.0 0.2 0.4 0.6 0.8 1.0<br />

p−value<br />

p−value<br />

Microarray example; 2 similar forms of head & neck cancer


Notes<br />

1. p-value <strong>and</strong> q-value give same order<strong>in</strong>g<br />

2. q is not necessarily greater than p


The mixture model perspective;<br />

why start with p-values?<br />

f(x) =⇡f 0 (x)+(1<br />

⇡)f 1 (x)<br />

1. x is all data on <strong>in</strong>ference unit<br />

f 0 <strong>and</strong> f 1 are modeled<br />

Newton et al 2001; Kendziorski<br />

et al. 2003 (microarrays);<br />

et al.<br />

2. x is univariate test statistic<br />

Efron et al. (local FDR)


Mixture gives e(x) =P (H 0 | x)<br />

L = {units i : e(x i ) apple k}<br />

Duality<br />

E (#errors on L| data) = X i<br />

e i 1[e i apple k]<br />

k<br />

controls conditional FDR


Notes<br />

1. If q-value <strong>and</strong> locFDR are based on the<br />

same test statistic, ma<strong>in</strong> difference is<br />

sided-ness [masked by q, not locFDR]<br />

2. Substantial differences occur ow<strong>in</strong>g to<br />

form of the rank<strong>in</strong>g variable (e.g., the<br />

underly<strong>in</strong>g test statistic)


Everyth<strong>in</strong>g <strong>in</strong> moderation<br />

Smyth (2004). L<strong>in</strong>ear models <strong>and</strong> empirical Bayes methods for<br />

assess<strong>in</strong>g differential expression <strong>in</strong> microarray experiments.<br />

to a first order approximation.<br />

SAGMB, 3, Article 3. [limma] [gene expression]<br />

to be evaluated at ˆ↵ g <strong>and</strong> the dependence is assumed to be such tha<br />

Let v gj be the jth diagonal element of C T V g C. The distributional<br />

<strong>in</strong> this paper about the data can be summarized by<br />

L<strong>in</strong>ear model per gene<br />

ˆgj | gj,<br />

2<br />

g ⇠ N( gj ,v gj<br />

2<br />

g )<br />

<strong>and</strong><br />

s 2 g |<br />

2<br />

g ⇠<br />

where d g is the residual degrees of freedom for the l<strong>in</strong>ear model for g<br />

assumptions the ord<strong>in</strong>ary t-statistic<br />

2<br />

g<br />

d g<br />

2<br />

dg<br />

t gj =<br />

ˆgj<br />

s g<br />

p<br />

vgj<br />

follows an approximate t-distribution on d g degrees of freedom.


3 Hierarchical Model<br />

Everyth<strong>in</strong>g <strong>in</strong> moderation<br />

Given<br />

ast<strong>and</strong>ardconjugatepriorforthenormaldistributionalmodelassumed<strong>in</strong>thepre<br />

the large number of gene-wise l<strong>in</strong>ear model fits aris<strong>in</strong>g from a microa<br />

periment, This section. describes there In the is the case a expected press<strong>in</strong>g of replicated distribution need tos<strong>in</strong>gle take ofsample advantage log-folddata, changes of the thefor model parallel genes <strong>and</strong>which structure priorare here d<br />

theentially reparametrization same model expressed. is fitted of Apart that tofrom proposed eachthe gene. mix<strong>in</strong>g by This Lönnstedt proportion section <strong>and</strong>def<strong>in</strong>es pSpeed j , the a(2002). above simple equations The hierarchica paramet desc<br />

which ast<strong>and</strong>ardconjugatepriorforthenormaldistributionalmodelassumed<strong>in</strong>theprev<br />

tions <strong>in</strong>are e↵ect related describes through this d g parallel = f, v g structure. =1/n, d 0 =2⌫, The skey 2 is to describe how the u<br />

0 = a/(d 0 v g )<strong>and</strong>v 0 = c whe<br />

2<br />

coen, cients ⌫ <strong>and</strong> agjare <strong>and</strong> as unknown <strong>in</strong> Lönnstedt variances <strong>and</strong> Speed g vary (2002). across See also genes. Lönnstedt This is (2001). done byFo<br />

a<br />

Smyth (2004). L<strong>in</strong>ear models <strong>and</strong> empirical Bayes methods for<br />

assess<strong>in</strong>g differential where D() is theexpression Dirichlet function. <strong>in</strong> microarray experiments.<br />

SAGMB, 3, prior Article<br />

calculations The distributions null jo<strong>in</strong>t<br />

3. [limma]<br />

this for distribution paper these[gene setsof above of ˜t <strong>and</strong> parameters. expression]<br />

prior s 2 is details are su<br />

Prior <strong>in</strong>formation is assumed on g 2 equivalent to a prior estimator s 2 0 with d 0<br />

of freedom, n, ⌫Under <strong>and</strong> ai.e.,<br />

the areabove as <strong>in</strong> Lönnstedt hierarchical<br />

p(˜t, s 2 |<br />

<strong>and</strong> model,<br />

=0)=˜sv<br />

Speed the(2002). posterior 1/2 p( ˆ,s See 2 mean<br />

|<br />

also<br />

=0) 2<br />

Lönnstedt of g given (2001). s 2 g is ˜s For g<br />

2<br />

which calculations after collection <strong>in</strong> this paper of factors the above yields 1prior details<br />

⇠ 1 are su cient <strong>and</strong> it is not neces<br />

2<br />

to fully specify a multivariate prior for2<br />

the<br />

g<br />

0 s 2 d 0<br />

.<br />

˜s 2 g.<br />

g = d 0s 2 0 + d g s 2 g<br />

0 .<br />

2<br />

Under the above hierarchical model, the posterior mean of g given s 2 g is ˜s g<br />

2<br />

0 + d<br />

p(˜t, s 2 (d 0 s 2<br />

| =0) =<br />

0) d0/2 d d/2 2(d/2 1)<br />

g s<br />

B(d/2,d 0 /2)(d 0 s 2 0 + ds 2 ) d 0/2+d/2<br />

Conjugate prior<br />

Then p j is the expected proportion of truly di↵erentially expressed genes. For t<br />

2<br />

2<br />

which are nonzero, prior <strong>in</strong>formation gj | g, on gj 6= the0⇠ coeN(0,v cient 0j is g ). assumed equivalent to a p<br />

observation equal to zero with unscaled variance v 0j , i.e.,<br />

This describes the expected distribution of log-fold changes for genes which are d<br />

entially expressed. Apart from 2<br />

2<br />

gj | the mix<strong>in</strong>g<br />

g, gj 6= 0⇠ proportion N(0,v 0j<br />

p j g ). , the above equations des<br />

section. In the case of replicated s<strong>in</strong>gle sample data, the model <strong>and</strong> prior here<br />

reparametrization of that proposed by Lönnstedt <strong>and</strong> Speed cient (2002). <strong>and</strong> it The is not paramet nece<br />

tions to fully arespecify relatedathrough multivariate d g = f, prior v g =1/n, for the d 0 g. =2⌫, s 2 0 = a/(d 0 v g )<strong>and</strong>v 0 = c whe<br />

The posterior values shr<strong>in</strong>k the observed<br />

˜s 2 g = d 0svariances 2 0 + d g s 2 towards the prior values with<br />

g<br />

degree of shr<strong>in</strong>kage depend<strong>in</strong>g on the(d relative 0 + d) 7 .<br />

d 0 + 1/2 sizes d of the observed <strong>and</strong> prior degre<br />

⇥<br />

g 1+ ˜t 2 ! (1+d0 +d)/2<br />

freedom. Def<strong>in</strong>e the moderated t-statistic B(1/2,d 0 by /2+d/2) d 0 + d<br />

The posterior values shr<strong>in</strong>k the observed variances towards the prior values with<br />

This degree shows of shr<strong>in</strong>kage that ˜t <strong>and</strong> depend<strong>in</strong>g s 2 are <strong>in</strong>dependent on the relative with sizes of the observed <strong>and</strong> prior degre<br />

˜t gj =<br />

ˆgj<br />

p .<br />

freedom. Def<strong>in</strong>e the moderated t-statistic by ˜s g vgj<br />

s 2 ⇠ s 2 0F d,d0<br />

Shr<strong>in</strong>kage estimate<br />

Moderated test statistic<br />

<strong>and</strong><br />

This statistic represents a hybrid classical/Bayes<br />

˜t gj =<br />

ˆgj approach <strong>in</strong> which the posterior<br />

p .<br />

ance has been substituted <strong>in</strong>to to the classical ˜s g vgj t-statistic <strong>in</strong> place of the usual sa<br />

˜t | =0⇠ t d0 +d.<br />

variance. The moderated t reduces to the ord<strong>in</strong>ary t-statistic if d 0 =0<strong>and</strong>a<br />

The This opposite above statistic end derivation of represents the spectrum goes a hybrid through is proportion classical/Bayes similarlytowith<br />

theapproach coe6= 0,theonlydi↵erence cient <strong>in</strong> ˆgj which if d 0 = posterior 1. be<strong>in</strong><br />

anceInhas thebeen nextsubstituted section the<strong>in</strong>to moderated to the classical t-statistics t-statistic ˜t gj <strong>and</strong> <strong>in</strong> residual place of sample the usual varianc sam<br />

Modified null (predictive)<br />

˜ 1/2


Don’t summarize too soon<br />

Leek, JT <strong>and</strong> Storey JD (2007). Captur<strong>in</strong>g heterogeneity <strong>in</strong><br />

gene expression studies by surrogate variable analysis. PLoS<br />

Genetics 3(9): e161.<br />

effects of unmeasured<br />

<strong>and</strong> unmodeled factors


Beyond hypothesis test<strong>in</strong>g<br />

• empirical Bayes procedures<br />

• procedures designed aga<strong>in</strong>st global criteria


Empirical Bayes<br />

“local” posterior (unit i)<br />

p (✓ i |x i ,⌘ i ) / f(✓) p(x i |✓ i ,⌘ i )<br />

p(x i |✓ i ,⌘ i )<br />

• simple form,<br />

• known nuisance params<br />

f,F<br />

• estimated us<strong>in</strong>g all units


Estimat<strong>in</strong>g the “mix<strong>in</strong>g” or “prior” distribution<br />

Nonparametric MLE<br />

L<strong>in</strong>dsay (1995).<br />

ˆF = argmax F<br />

Y<br />

i<br />

Z<br />

p(x i |✓, ⌘ i ) dF (✓)<br />

Note:<br />

ˆF (A) = mean i P (✓ i 2 A|x i ,⌘ i )<br />

Or other options...


Gene set enrichment (<strong>in</strong>fluenza RNAi example)<br />

n i = size of set i<br />

✓ i ⇠ F<br />

x i ⇠ B<strong>in</strong>omial(n i ,✓ i )


Gene set enrichment<br />

ˆF<br />

0/260<br />

p(✓ i |x i ,⌘ i )<br />

density<br />

1/25<br />

23/729<br />

9/149<br />

0.00 0.0092 0.02 0.04 0.06 0.08<br />

θ


Posterior means:<br />

ˆ✓i = E (✓ i | data)<br />

1. “local” Bayes estimate under squared error loss<br />

2. trades off bias <strong>and</strong> variance<br />

3. widely used to prioritize units by non-testers!


Posterior expected ranks<br />

Laird <strong>and</strong> Louis (1989). Empirical Bayes rank<strong>in</strong>g methods. JES,<br />

14, 29-46.<br />

rank(✓ i )= X j<br />

1(✓ i apple ✓ j )<br />

rank(✓ ˆ<br />

X i )=<br />

j<br />

P (✓ i apple ✓ j | data)<br />

from the top


Low posterior quantiles<br />

P<br />

⇣✓ i apple ˆ✓ i x i<br />

⌘<br />

=0.05


Low posterior quantiles<br />

P<br />

⇣✓ i apple ˆ✓ i x i<br />

⌘<br />

=0.05<br />

density<br />

0.00 0.02 0.04 0.06 0.08<br />

θ


Every list<strong>in</strong>g of top-rank<strong>in</strong>g sets has a<br />

conditional FDR statement:<br />

E<br />

P<br />

i 1[ˆ✓ i<br />

P<br />

k]1[✓ i


Every list<strong>in</strong>g of top-rank<strong>in</strong>g sets has a<br />

conditional FDR statement:<br />

E<br />

P<br />

i 1[ˆ✓ i<br />

P<br />

k]1[✓ i


0.5<br />

4 replicate flu<br />

studies agree<br />

much more at<br />

set level than<br />

gene level<br />

agreement among studies<br />

0.4<br />

0.3<br />

0.2<br />

0.1<br />

0.0<br />

0 50 100 150 200 250<br />

number of top sets<br />

low 5% −ile<br />

posterior mean<br />

y/n<br />

z<br />

gene list


Other approaches:<br />

L<strong>in</strong>, R, Louis, TA, Paddock, SM, <strong>and</strong> Ridgeway, G (2006). Loss<br />

function based rank<strong>in</strong>g <strong>in</strong> two-stage hierarchical models.<br />

Bayesian Analysis 1, 915-946.<br />

• Bayes estimates under various loss functions<br />

• squared error loss; classification loss; distancebased<br />

loss; weighted loss...


Options<br />

• MLE<br />

• p-value (q-value) [aga<strong>in</strong>st no-effect H0]<br />

• Pr(H0|data)<br />

[parametric/NP/SP/locFDR]<br />

• moderated t (p-value)<br />

• SVA (p-value)<br />

• local posterior mean<br />

• local posterior quantile<br />

• other Bayesian: loss-based<br />

• other test<strong>in</strong>g: p-values aga<strong>in</strong>st other H0<br />

• other estimates: ...


T = P n<br />

i=1 [X i >25] 16 20.90 (0.65) 8.03 (0.48)<br />

T = P n<br />

i=1 [X i >40] 12 13.10 (0.66) 1.10 (0.22)<br />

T = P n<br />

i=1 [X i >60] 11 11.17 (0.61) 0.13 (0.06)<br />

The method matters<br />

Table 2: Flu Data: summary <strong>in</strong>formation for several chosen gene sets. Here, m i represents the number<br />

of genes with<strong>in</strong> a given GO term, <strong>and</strong> X i represents the number of genes belong<strong>in</strong>g to both the given<br />

GO term <strong>and</strong> the 614 genes of <strong>in</strong>terest. The rank<strong>in</strong>gs accord<strong>in</strong>g to several di↵erent methods are also<br />

provided. For example, the category with GO id GO:0022627 was ranked number one by both the<br />

posterior mean <strong>and</strong> posterior rank method<br />

ID GO Term X i m i p.mean p.rank p.val mle<br />

GO:0022627 cytosolic small ribosomal subunit 18 36 1 1 3 251<br />

GO:0004672 prote<strong>in</strong> k<strong>in</strong>ase activity 76 594 119 65 2 1074<br />

GO:0030126 COPI vesicle coat 7 11 3 3 52 159<br />

GO:0030529 ribonucleoprote<strong>in</strong> complex 69 504 101 54 1 1040<br />

GO:0000059 prote<strong>in</strong> import <strong>in</strong>to nucleus, dock<strong>in</strong>g 2 2 62 271 178 1<br />

GO:0005852 eukaryotic translation <strong>in</strong>itiation factor ... 8 14 4 2 40 164<br />

GO:0048200 Golgi transport vesicle coat<strong>in</strong>g 6 10 7 8 75 161<br />

GO:0000912 assembly of actomyos<strong>in</strong> apparatus... 1 1 325 770 411 2<br />

GO:0048205 COPI coat<strong>in</strong>g of Golgi vesicle 6 10 8 9 76 162<br />

GO:0033179 proton-transport<strong>in</strong>g V-type ATPase,... 5 6 5 10 69 147


A framework for compar<strong>in</strong>g methods<br />

<strong>and</strong> generat<strong>in</strong>g better ones


Decouple local parameters with<br />

variance-stabiliz<strong>in</strong>g transformation<br />

X i ⇠ Normal ✓ i ,<br />

2<br />

i X i is the local MLE<br />

✓ i ? 2 i <strong>in</strong> the system<br />

✓ i ⇠ F<br />

2<br />

i ⇠ G<br />

all known, for now


Threshold<strong>in</strong>g functions <strong>and</strong> rank<strong>in</strong>g variables<br />

“Precision-guided” <strong>selection</strong> rules<br />

{t ↵ ( 2 )}<br />

unit i on top ↵ list L ↵ if X i t ↵ ( 2 i )<br />

P X t ↵ ( 2 ) = ↵ marg<strong>in</strong>al<br />

<strong>Rank<strong>in</strong>g</strong> variables:<br />

R i<br />

R constant for all X, 2 s.t. X = t ↵ ( 2 )<br />

monotone <strong>in</strong> ↵


p-value rule<br />

P−value family of threshold functions<br />

6<br />

H 0,i : ✓ i =0<br />

R i =1 pval i = (X i / i )<br />

X<br />

4<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

List sizes<br />

size 0.01<br />

size 0.05<br />

size 0.1<br />

2<br />

0<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ● ● ●<br />

●<br />

●<br />

● ● ●<br />

● ●<br />

t ↵ ( 2 )=c ↵ 2 ⇠ Gamma(a, b)<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

0 1 2 3 4 5<br />

σ 2


Posterior mean rule<br />

P−value family of threshold functions<br />

Posterior mean family of threshold functions<br />

6<br />

6<br />

F = Normal(0, 1)<br />

4<br />

R<br />

●<br />

i = E(✓<br />

●●<br />

●<br />

●<br />

● i |X●<br />

i )=<br />

X<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

X i<br />

2<br />

i +1<br />

List sizes<br />

size 0.01<br />

size 0.05<br />

size 0.1<br />

X<br />

4<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

List sizes<br />

size 0.01<br />

size 0.05<br />

size 0.1<br />

2<br />

0<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

● ● ● ●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

t ↵ ( 2 )=c ↵ 2 +1<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

2<br />

0<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ● ● ●<br />

●<br />

●<br />

● ● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

0 1 2 3 4 5<br />

σ 2<br />

0 1 2 3 4 5<br />

σ 2<br />

2 ⇠ Gamma(a, b)


Limma rule<br />

R i =<br />

X i<br />

p<br />

w<br />

2<br />

i +(1 w)<br />

x<br />

0 1 2 3 4 5 6<br />

0 1 2 3 4 5<br />

σ 2<br />

2 ⇠ Exp(1)


Threshold Curves for a list size of 0.05.<br />

4<br />

3<br />

t(σ 2 )<br />

2<br />

Method<br />

mle<br />

p−value<br />

post mean<br />

post rank<br />

1<br />

0<br />

0.0 0.5 1.0 1.5<br />

σ 2<br />

2 ⇠ Gamma(a, b)<br />

E( 2 )=CV ( 2 )=1


Performance Criteria<br />

Notes:<br />

we know ✓ ↵ such that: P (✓ i ✓ ↵ )=↵<br />

each rank<strong>in</strong>g procedure corresponds to rules: X i t ↵ ( 2 i )


Performance Criteria<br />

Accuracy:<br />

1 X<br />

n ↵<br />

i<br />

1 X i t ↵ ( 2 i ) 1(✓ i ✓ ↵ )


Performance Criteria<br />

Content:<br />

1 X<br />

n ↵<br />

i<br />

1 X i t ↵ ( 2 i ) ✓ i


Performance Criteria<br />

Stability:<br />

1 X<br />

n ↵<br />

i<br />

1 X i t ↵ ( 2 i ) 1 X ⇤ i t ↵ ( 2 i )


Performance Criteria: take limit on # units<br />

Accuracy:<br />

1 X<br />

n ↵<br />

i<br />

1 X i t ↵ ( 2 i ) 1(✓ i ✓ ↵ )<br />

! P X t ↵ ( 2 ) ✓ ✓ ↵<br />

<strong>in</strong>tegrates<br />

✓, , <strong>and</strong> X


Performance Criteria: limit<strong>in</strong>g # units<br />

Accuracy:<br />

P X t ↵ ( 2 )|✓ ✓ ↵<br />

P ✓ ✓ ↵ |X t ↵ ( 2 )<br />

Content:<br />

Stability:<br />

E ✓| X t ↵ ( 2 )<br />

P X ⇤ t ↵ ( 2 )|X t ↵ ( 2 )


content<br />

Mean Content with E(σ 2 )=15<br />

Mean Content with E(σ 2 )=5<br />

Mean Content with E(σ 2 )=1<br />

Mean Content with E(σ 2 )=1/2<br />

MLE<br />

MLE<br />

MLE<br />

MLE<br />

pval<br />

pval<br />

pval<br />

pval<br />

mean content<br />

0.0 0.5 1.0 1.5 2.0<br />

post.mean<br />

mean content<br />

0.0 0.5 1.0 1.5 2.0<br />

post.mean<br />

mean content<br />

0.0 0.5 1.0 1.5 2.0<br />

post.mean<br />

mean content<br />

0.0 0.5 1.0 1.5 2.0<br />

post.mean<br />

0.5 1.0 1.5 2.0<br />

0.5 1.0 1.5 2.0<br />

0.5 1.0 1.5 2.0<br />

0.5 1.0 1.5 2.0<br />

Coef. of Variation<br />

Coef. of Variation<br />

Coef. of Variation<br />

Coef. of Variation


Optimal thresholds<br />

Approach:<br />

• work <strong>in</strong> limit<strong>in</strong>g case, normal model<br />

• assume smooth threshold functions<br />

• assume smooth density for variances<br />

• use calculus of variations<br />

• f<strong>in</strong>d thresholds maximiz<strong>in</strong>g criteria


Optimal thresholds<br />

Maximize Stability: P X ⇤ t ↵ ( 2 )|X t ↵ ( 2 )<br />

High stability with steep thresholds;<br />

tries to select units with smallest<br />

variation regardless of data<br />

No solution


Optimal thresholds<br />

Maximize Content: E ✓| X t ↵ ( 2 )<br />

Solution:<br />

t ↵ ( 2 )=c ↵ 2 +1<br />

R i = E(✓ i |X i )=<br />

X i<br />

2<br />

i +1<br />

i.e. rank<strong>in</strong>g by local Bayes estimate has global<br />

optimality property


Optimal thresholds<br />

Maximize Accuracy:<br />

P X t ↵ ( 2 )|✓ ✓ ↵<br />

Solution:<br />

t ↵ ( 2 )=✓ ↵ ( 2 + 1) u ↵<br />

p<br />

2<br />

(1 + 2 )


Thresholds for most accurate <strong>and</strong> <strong>high</strong>est content<br />

x<br />

0 1 2 3 4 5 6<br />

0 1 2 3 4 5<br />

σ 2<br />

x<br />

0 1 2 3 4 5 6<br />

0 1 2 3 4 5<br />

σ 2<br />

2 ⇠ Exp(1)


Optimal thresholds<br />

Maximize Accuracy:<br />

P X t ↵ ( 2 )|✓ ✓ ↵<br />

Solution:<br />

t ↵ ( 2 )=✓ ↵ ( 2 + 1) u ↵<br />

p<br />

2<br />

(1 + 2 )<br />

Let<br />

U ↵ (X,<br />

2 )=P ✓ ✓ ↵ |X,<br />

2<br />

R i =m<strong>in</strong> ⇥ ↵ : P<br />

U ↵ (X,<br />

2 ) U ↵ (X i ,<br />

2<br />

i )|X i ,<br />

2<br />

i<br />

↵ ⇤<br />

R i = ↵ if i has position ↵ with<strong>in</strong> P (✓ j ✓ ↵ |X j ,<br />

2<br />

j )<br />

smaller better


St<strong>and</strong>ard Model: list-conditional variance distributions.<br />

Quantiles of list−conditional variances<br />

Quantiles of list−conditional variances<br />

5<br />

5<br />

0<br />

0<br />

log(quantile)<br />

−5<br />

Method<br />

p.mean<br />

p.value<br />

log(quantile)<br />

−5<br />

Method<br />

mle<br />

true<br />

−10<br />

−10<br />

−15<br />

−15<br />

0.0 0.5 1.0 1.5 2.0<br />

Coef. of Variation<br />

0.0 0.5 1.0 1.5 2.0<br />

Coef. of Variation<br />

E( 2 )=5


Thresholds for most accurate <strong>and</strong> <strong>high</strong>est content<br />

x<br />

0 1 2 3 4 5 6<br />

0 1 2 3 4 5<br />

σ 2<br />

x<br />

0 1 2 3 4 5 6<br />

0 1 2 3 4 5<br />

σ 2<br />

2 ⇠ Exp(1)


Some observations<br />

• rank<strong>in</strong>g by p-value is stable, but puts too<br />

many small-variance units on top list<br />

• rank<strong>in</strong>g by MLE puts too many <strong>high</strong>variance<br />

units on the top list<br />

• rank<strong>in</strong>g by posterior mean is a good<br />

tradeoff, <strong>and</strong> it maximizes list content<br />

• more accurate top-list EB methods are<br />

available


Performance Criteria<br />

“Aggregated” versions:<br />

1 X<br />

n ↵<br />

i<br />

1 X<br />

n ↵<br />

i<br />

1 X i t ↵ ( 2 i ) ✓ i 1(✓ i ✓ ↵ )<br />

1 X i t ↵ ( 2 i ) 1 X ⇤ i t ↵ ( 2 i ) 1(✓ i ✓ ↵ )


Optimal thresholds for aggregated criteria<br />

Solutions exhibits variance screen<strong>in</strong>g<br />

Figure 7: The threshold curve which maximizes the “aggregate content”. This procedure discards all<br />

units whose variances exceed around 0.14.. Here, the list size is .025, <strong>and</strong> the st<strong>and</strong>ard model parameters<br />

are E( 2 ) = 5 <strong>and</strong> CV( 2 )=1<br />

Curve for list size=0.025, with CV(σ 2 ) = 1 <strong>and</strong> E(σ 2 )=5<br />

t(σ 2 )<br />

0.0 0.5 1.0 1.5 2.0 2.5<br />

optimal aggregate<br />

post mean<br />

0.00 0.05 0.10 0.15<br />

σ 2

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!