25.02.2015 Views

Non-specific filtering - Computational Statistics for Genome Biology

Non-specific filtering - Computational Statistics for Genome Biology

Non-specific filtering - Computational Statistics for Genome Biology

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Non</strong>-<strong>specific</strong> <strong>filtering</strong> and control of<br />

false positives<br />

Richard Bourgon<br />

16 June 2009<br />

bourgon@ebi.ac.uk<br />

EBI is an outstation of the European Molecular <strong>Biology</strong> Laboratory


Outline<br />

• Multiple testing I: overview<br />

• <strong>Genome</strong>-wide experiments<br />

• Why adjustment is necessary<br />

• Experiment-wide “type I error rates”<br />

• Uni<strong>for</strong>mity and a mixture of p-values<br />

• <strong>Non</strong>-<strong>specific</strong> <strong>filtering</strong><br />

• The ALL data: B-cell subtypes<br />

• <strong>Non</strong>-<strong>specific</strong> <strong>filtering</strong> to increase rejections<br />

• Multiple testing II: impact of <strong>filtering</strong><br />

• <strong>Non</strong>-<strong>specific</strong> <strong>filtering</strong>: are we controlling error rate?<br />

• Independence<br />

• Implications <strong>for</strong> FWER and FDR<br />

Slide 2


Multiple testing I<br />

<br />

Slide 3


Differential expression testing<br />

Acute lymphocytic leukemia (ALL) data<br />

• Chiaretti et al., Clinical Cancer Research<br />

11:7209, 2005.<br />

• Immunophenotypic analysis of cell surface<br />

markers identified…<br />

• T-cell derivation in 33 samples.<br />

• B-cell derivation in 95 samples.<br />

• Affymetrix HG-U95Av2 oligonucleotide arrays<br />

with ~ 13,000 probe sets.<br />

• Chiaretti et al. identified 792 differentially<br />

expressed genes, with “sufficient levels of<br />

expression and variation across groups.”<br />

Clustered expression data <strong>for</strong> all 128<br />

subjects, and a subset of 475 genes<br />

showing evidence of differential<br />

expression between groups<br />

Slide 4


The traditional type I error rate<br />

• The traditional p-value <strong>for</strong> a single gene:<br />

• Define a test statistic T based on the expression data.<br />

• Compute its value, t, <strong>for</strong> the observed data.<br />

• Define p = P(T > t | the gene is not differentially expressed).<br />

• Compare p to an acceptable type I (false positive) error rate.<br />

• Suppose Chiaretti et al. had compared replicate samples, so that no gene was<br />

differentially expressed. Comparing p-values to = .05 gives and average of<br />

13,000 .05 = 650 false positives.<br />

• Per-family error rate (PFER): the expected number of false positives. For the<br />

real data, PFER 650.<br />

Slide 5


Experiment-wide type I error rates<br />

Not rejected Rejected Total<br />

True null hypotheses U V m 0<br />

False null hypotheses T S m 1<br />

Total m – R R m 0<br />

• Family-wise error rate: P(V > 0), i.e., the probability of one or more false<br />

positives. For large m 0 , this is very difficult to keep small.<br />

• False discovery rate (FDR): let Q = V/R, or 0 if R is 0. The FDR is E(Q), i.e.,<br />

the expected fraction of false positives among all discoveries.<br />

Slide 6


A nice property of continuous random variables<br />

• For a continuous random<br />

variable X, P(X = x) is 0 <strong>for</strong> any<br />

x.<br />

Continuous<br />

Gaussian<br />

Exponential<br />

Discrete<br />

Binomial<br />

Geometric<br />

• The distribution function:<br />

Standard normal distribution function<br />

F(x) = P(X x).<br />

• The nice property: if we define<br />

random U = F(X), then U is<br />

uni<strong>for</strong>mly distributed on the unit<br />

interval [0,1].<br />

P(X x)<br />

0.0 0.2 0.4 0.6 0.8 1.0<br />

-4 -2 0 2 4<br />

x<br />

Slide 7


A nice property of CDFs <strong>for</strong> continuous RVs<br />

> X = rnorm(100000)<br />

> F = pnorm<br />

> hist(X, breaks = 50)<br />

> hist(F(X), breaks = 50)<br />

Histogram of X<br />

Histogram of F(X)<br />

Frequency<br />

0 2000 4000 6000 8000<br />

Frequency<br />

0 500 1000 1500 2000<br />

-4 -2 0 2 4<br />

X<br />

0.0 0.2 0.4 0.6 0.8 1.0<br />

F(X)<br />

Slide 8


A “nice” property?<br />

• To compute a p-value <strong>for</strong> testing a null hypothesis H 0 , we typically…<br />

• Define a test statistic T, and compute its value t <strong>for</strong> the observed data.<br />

• Assume we know the distribution of T when H 0 is true: F 0 .<br />

• Compute p = 1 – F 0 (t), i.e., define p = P(T > t | H 0 is true).<br />

• Compare p to some .<br />

• Now define the random variable P = 1 – F 0 (T). If H 0 is true, then…<br />

• F 0 (T) is uni<strong>for</strong>mly distribution on [0,1].<br />

• By symmetry, P is uni<strong>for</strong>mly distribution on [0,1] as well.<br />

• Suppose 20% of genes are differentially expressed, so that<br />

0<br />

= m 0<br />

m<br />

= .80 ...<br />

Slide 9


Observed p-values: a mixture<br />

A.<br />

F 0<br />

C.<br />

Mixture density 0 F 0 + 1 F ( 0 = 0.8)<br />

0 10<br />

0 10<br />

0.0 0.2 0.4 0.6 0.8 1.0<br />

B.<br />

F<br />

Density<br />

0 0<br />

3<br />

II<br />

III<br />

I<br />

IV<br />

False negative<br />

True positive<br />

False positive<br />

True negative<br />

0.0 0.2 0.4 0.6 0.8 1.0<br />

0.0 0.2 0.4 0.6 0.8 1.0<br />

<br />

p<br />

Slide 10


Observed p-values: a mixture<br />

A.<br />

F 0<br />

C.<br />

Mixture density 0 F 0 + 1 F ( 0 = 0.8)<br />

0 10<br />

0 10<br />

0.0 0.2 0.4 0.6 0.8 1.0<br />

B.<br />

F<br />

Density<br />

0 0<br />

3<br />

0.0 0.2 0.4 0.6 0.8 1.0<br />

0.0 0.2 0.4 0.6 0.8 1.0<br />

<br />

p<br />

Slide 11


<strong>Non</strong>-<strong>specific</strong> <strong>filtering</strong><br />

<br />

Slide 12


Continuing with the Chiaretti et al. ALL data<br />

• 79 subjects with B-cell ALL:<br />

• 37 with the BCR/ABL mutation (“Philadelphia chromosome”)<br />

• 42 with no observed cytogenetic abnormalities<br />

• We’ve seen that…<br />

• Multiple testing correction becomes more extreme <strong>for</strong> larger m.<br />

• Removal of true null hypotheses reduces the FDR associated with a particular p-<br />

value cutoff.<br />

• Proposal: non-<strong>specific</strong> <strong>filtering</strong><br />

• von Heydebreck, Huber and Gentleman, Encyclopedia of Genetics, Genomics,<br />

Proteomics and Bioin<strong>for</strong>matics. Wiley, 2005.<br />

• McClintick and Edenberg, BMC Bioin<strong>for</strong>matics 7:49, 2006.<br />

Slide 13


<strong>Non</strong>-<strong>specific</strong> <strong>filtering</strong><br />

• For a given gene, write the data as ((c 1 ,Y 1 ),…,(c p ,Y p )).<br />

• First group (c = 1): i = 1, …, p 1 .<br />

• First group (c = 2): i = p 1 + 1, …, p 1 + p 2 .<br />

• Conditions under which we expect little variation in Y:<br />

1. Genes which are absent in both samples. (Probes will still report noise and crosshybridization,<br />

typically at the same level in both groups.)<br />

2. Probe sets which do not respond to target.<br />

3. Genes which are not differentially expressed.<br />

• A “non-<strong>specific</strong>” filter:<br />

• Ignores c 1 , …, c p , i.e., U (I) (Y).<br />

• Helps identify any of these three classes, based on our a priori understanding of<br />

array behavior.<br />

• Apply standard testing to genes passing the filter, using some U (II) (c,Y).<br />

Slide 14


Increased rejection rate<br />

• Stage one non-<strong>specific</strong> filter statistic: compute the overall variance<br />

U (I) (Y) = S 2 = 1<br />

and remove the smallest.<br />

• Stage two: standard two-sample t-test <strong>for</strong> genes passing stage 1.<br />

p1<br />

p<br />

(Y i<br />

Y ) 2<br />

i=1<br />

a<br />

b<br />

R<br />

0 500 1000 1500<br />

= 0.5<br />

= 0.4<br />

= 0.3<br />

= 0.2<br />

= 0.1<br />

= 0<br />

R<br />

0 500 1000 1500<br />

0.00 0.05 0.10 0.15 0.20 0.25 0.30<br />

FDR (BH)<br />

0.00 0.05 0.10 0.15 0.20 0.25 0.30<br />

qvalue<br />

Slide 15


Increased power?<br />

• An increased detection rate implies increased power only if we are still<br />

controlling type I errors at the nominal level.<br />

a<br />

b<br />

R<br />

0 500 1000 1500<br />

= 0.5<br />

= 0.4<br />

= 0.3<br />

= 0.2<br />

= 0.1<br />

= 0<br />

R<br />

0 500 1000 1500<br />

0.00 0.05 0.10 0.15 0.20 0.25 0.30<br />

FDR (BH)<br />

0.00 0.05 0.10 0.15 0.20 0.25 0.30<br />

qvalue<br />

Slide 16


Multiple testing II: impact of <strong>filtering</strong><br />

<br />

Slide 17


Slide 18<br />

Notation ahead!


Conditional control<br />

• Random variable definitions:<br />

• V: number of false positives. FWER = P(V > 0).<br />

• R: total number of rejections.<br />

• Q: V / max{R, 1}. FDR = E(Q). Note that R = 0 implies that V = 0.<br />

• M: the random set (of size M) of hypothesis passing the stage-one filter.<br />

• U (I) and U (II) : the stage-one and stage-two test statistics.<br />

• Conditional control is sufficient: E(Q) = E(E(Q|M)).<br />

• Because we only reject at stage two, given M, we can assess conditional<br />

control of type I error by considering how a procedure per<strong>for</strong>ms when applied<br />

to the M conditionally distributed (U (II) I1<br />

,…,U (II) IM<br />

)| .<br />

M={I1 ,…,I M<br />

}<br />

• Thus, FWER or FDR control is achieved if the conditional distributions of the<br />

stage-two statistics meet requirements <strong>for</strong> the control procedure.<br />

Slide 19


Requirements <strong>for</strong> FWER and FDR control<br />

• Marginal properties of true-null test statistics.<br />

• Distributions must be properly specified.<br />

• Joint properties of all test-statistics.<br />

• Adjustment procedures may…<br />

…work <strong>for</strong> arbitrary dependence structure (e.g., Bonferroni and Holm).<br />

…estimate and correct <strong>for</strong> dependence (e.g., Westfall and Young).<br />

…require that dependence structure satisfies some assumptions.<br />

• Assumptions about dependence structure:<br />

• Independence!<br />

• Subset pivotality.<br />

• Positive regression dependence.<br />

• Convergence of the empirical processes based on V() and R().<br />

• Etc.<br />

Slide 20


Independence of stage one and stage two test statistics<br />

• For genes <strong>for</strong> which the null hypotheses is true, U (I) and U (II) are statistically<br />

independent in both of the following cases:<br />

• For normally distributed data:<br />

• Stage one: overall mean, , or variance, S 2 = 1 p<br />

Y = 1 p<br />

p1 (Y i<br />

Y ) 2 .<br />

p Y<br />

i=1 i<br />

i=1<br />

• Stage two: the standard two-sample t-statistic, or any location- and scaleinvariant<br />

test statistic.<br />

• <strong>Non</strong>-parametrically:<br />

• Stage one: any function of the data which (i) to filter gene g only uses data<br />

from gene g, and (ii) doesn’t depend on the order of the arguments. S 2 above,<br />

or the IQR, are both candidates.<br />

• Stage two: the Wilcoxon rank sum test statistic.<br />

• Both can be extended to the multi-class context: ANOVA and Kruskal Wallis.<br />

Slide 21


Independence: Bonferroni and Holm FWER adjustments<br />

• Single-stage Bonferroni correction compares each p i to /m.<br />

• The Holm step-down procedure is a more powerful variant. For ordered p-<br />

values p (1) p (2) … p (m) , reject as long as<br />

• Independence of U (I) and U (II) implies that the key step in proving that these<br />

procedures control FWER still applies:<br />

( P(V > 0) = P {U (I) > a,U (II) )<br />

i0 i i<br />

> b}<br />

<br />

p (i )<br />

<<br />

i0<br />

i0<br />

P(U (I)<br />

i<br />

> a,U i (II) > b)<br />

= P(U (I)<br />

i<br />

> a)P(P i<br />

< )<br />

= E(M).<br />

<br />

m i + 1 .<br />

Slide 22


FWER: Westfall and Young<br />

• Westfall and Young (1993) controls FWER with more power, but depends on<br />

the joint distribution of all p-values:<br />

( ) .<br />

p i<br />

= P min<br />

1 j m P j<br />

p i<br />

| H 0<br />

C<br />

• WY93 is valid under subset pivotality. If this holds <strong>for</strong> the one-stage<br />

procedure, it holds <strong>for</strong> the two-stage non-<strong>specific</strong> <strong>filtering</strong> approach as well.<br />

H 0<br />

C<br />

• Distribution of min P j under is typically estimated by permutation. If <strong>filtering</strong><br />

changes correlation structure, new structure is used by permutation!<br />

Slide 23


FDR: Benjamini & Hochberg and Storey adjustments<br />

• What is the FDR associated with<br />

use of cutoff ? Naive estimator:<br />

• V is not observable, but E(V) is<br />

m 0 , which is bounded by m .<br />

• E(R) cannot be computed, but<br />

R can be used as an estimator.<br />

FDR<br />

() =<br />

• Evaluating at each p (i) using m<br />

or ˆm 0<br />

gives BH95 or Storey<br />

adjustments, respectively:<br />

FDR<br />

<br />

(p(i<br />

)<br />

) =<br />

m 0<br />

<br />

#{i : p i<br />

} <br />

m<br />

#{i : p i<br />

}<br />

m 0<br />

<br />

#{i : p i<br />

p (i )<br />

} = m 0<br />

.<br />

i<br />

density<br />

0 0<br />

3<br />

II I<br />

III IV<br />

FDR() = E<br />

V()<br />

R() 1 .<br />

False negative<br />

True positive<br />

False positive<br />

True negative<br />

0.0 0.2 0.4 0.6 0.8 1.0<br />

<br />

p<br />

Slide 24


FDR: Benjamini & Hochberg and Storey adjustments<br />

• The <strong>for</strong>egoing motivation <strong>for</strong> the<br />

BH95 and Storey procedures<br />

uses E(V()) = m 0 .<br />

FDR() = E<br />

V()<br />

R() 1 .<br />

• Marginal independence of true<br />

null U (I) and U (II) means that this<br />

still applies at stage two in<br />

expectation. Define M 0 to be the<br />

random number of true nulls<br />

passing stage 1. Then<br />

E(V()) = P(U (I) i<br />

> a, P<br />

i i<br />

< )<br />

0<br />

= P(U (I) i<br />

> a)P( P<br />

i i<br />

< )<br />

0<br />

= E(M 0<br />

).<br />

density<br />

0 0<br />

3<br />

II I<br />

III IV<br />

False negative<br />

True positive<br />

False positive<br />

True negative<br />

0.0 0.2 0.4 0.6 0.8 1.0<br />

<br />

p<br />

Slide 25


Counterexamples<br />

<br />

Slide 26


Counterexample #1: normality matters<br />

• Filter on S 2 , test using T.<br />

• For true null hypotheses, data<br />

are i.i.d. normal with probability<br />

p, but heavily skewed with<br />

probability 1 – p. Let latent X<br />

indicate mixture component<br />

identity.<br />

Frequency<br />

0 100 200 300 400<br />

^ | X = 0<br />

0.0 0.5 1.0 1.5 2.0 2.5 3.0<br />

^ | X = 1<br />

Frequency<br />

0 50 100 150 200 250<br />

T(Y) | X = 0<br />

4 2 0 2 4<br />

T(Y) | X = 1<br />

• The filter statistic is now<br />

predictive <strong>for</strong> X.<br />

• The conditional distribution of T<br />

will be more weighted towards<br />

the X = 1 case than the<br />

unconditional distribution.<br />

Frequency<br />

0 100 200 300 400<br />

0.0 0.5 1.0 1.5 2.0 2.5 3.0<br />

Frequency<br />

0 50 100 150 200 250<br />

4 2 0 2 4<br />

Slide 27


Counterexample #2: the limma t-statistic<br />

• Filter on S 2 , test using the limma<br />

moderated t-statistic.<br />

• The moderated t-statistic is not<br />

scale invariant, due to the effect<br />

of the global variance estimator.<br />

• This is more pronounced <strong>for</strong><br />

small n — precisely the context<br />

in which limma is most useful.<br />

• Filtering on overall mean rather<br />

than variance “solves” the<br />

problem.<br />

|t|<br />

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5<br />

Moderated t and overall variance<br />

0.5 1.0 1.5 2.0<br />

S<br />

Slide 28


Counterexample #2: the limma t-statistic<br />

Unconditional pvalues<br />

Conditional pvalues<br />

Frequency<br />

0 10 20 30 40 50 60<br />

Frequency<br />

0 10 20 30 40 50<br />

0.0 0.2 0.4 0.6 0.8 1.0<br />

fit$p.value[, 2]<br />

0.0 0.2 0.4 0.6 0.8 1.0<br />

fit$p.value[S > median(S), 2]<br />

Slide 29


Conclusions<br />

• In actual examples, use of an independent filter leads to (biologically)<br />

significant increases in the number of genes identified.<br />

• Some commonly used stage one/stage two test statistic pairs are statistically<br />

independent <strong>for</strong> genes which are not differentially expressed…<br />

• …but others are not! The non-<strong>specific</strong> criterion is not enough.<br />

• Given this independence, Bonferroni and Holm FWER control is valid in the<br />

two-stage procedure. Likewise <strong>for</strong> FDR-controlling procedures which make no<br />

assumptions about independence.<br />

• Correlation structure may change under <strong>filtering</strong>. Permutation-based Westfall<br />

and Young correction accounts <strong>for</strong> this. FDR control could theoretically suffer,<br />

although the practical impact is likely to be small.<br />

• Effect of <strong>filtering</strong> on correlation can also be checked, and impact, assessed.<br />

Slide 30

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!