Non-specific filtering - Computational Statistics for Genome Biology
Non-specific filtering - Computational Statistics for Genome Biology
Non-specific filtering - Computational Statistics for Genome Biology
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>Non</strong>-<strong>specific</strong> <strong>filtering</strong> and control of<br />
false positives<br />
Richard Bourgon<br />
16 June 2009<br />
bourgon@ebi.ac.uk<br />
EBI is an outstation of the European Molecular <strong>Biology</strong> Laboratory
Outline<br />
• Multiple testing I: overview<br />
• <strong>Genome</strong>-wide experiments<br />
• Why adjustment is necessary<br />
• Experiment-wide “type I error rates”<br />
• Uni<strong>for</strong>mity and a mixture of p-values<br />
• <strong>Non</strong>-<strong>specific</strong> <strong>filtering</strong><br />
• The ALL data: B-cell subtypes<br />
• <strong>Non</strong>-<strong>specific</strong> <strong>filtering</strong> to increase rejections<br />
• Multiple testing II: impact of <strong>filtering</strong><br />
• <strong>Non</strong>-<strong>specific</strong> <strong>filtering</strong>: are we controlling error rate?<br />
• Independence<br />
• Implications <strong>for</strong> FWER and FDR<br />
Slide 2
Multiple testing I<br />
<br />
Slide 3
Differential expression testing<br />
Acute lymphocytic leukemia (ALL) data<br />
• Chiaretti et al., Clinical Cancer Research<br />
11:7209, 2005.<br />
• Immunophenotypic analysis of cell surface<br />
markers identified…<br />
• T-cell derivation in 33 samples.<br />
• B-cell derivation in 95 samples.<br />
• Affymetrix HG-U95Av2 oligonucleotide arrays<br />
with ~ 13,000 probe sets.<br />
• Chiaretti et al. identified 792 differentially<br />
expressed genes, with “sufficient levels of<br />
expression and variation across groups.”<br />
Clustered expression data <strong>for</strong> all 128<br />
subjects, and a subset of 475 genes<br />
showing evidence of differential<br />
expression between groups<br />
Slide 4
The traditional type I error rate<br />
• The traditional p-value <strong>for</strong> a single gene:<br />
• Define a test statistic T based on the expression data.<br />
• Compute its value, t, <strong>for</strong> the observed data.<br />
• Define p = P(T > t | the gene is not differentially expressed).<br />
• Compare p to an acceptable type I (false positive) error rate.<br />
• Suppose Chiaretti et al. had compared replicate samples, so that no gene was<br />
differentially expressed. Comparing p-values to = .05 gives and average of<br />
13,000 .05 = 650 false positives.<br />
• Per-family error rate (PFER): the expected number of false positives. For the<br />
real data, PFER 650.<br />
Slide 5
Experiment-wide type I error rates<br />
Not rejected Rejected Total<br />
True null hypotheses U V m 0<br />
False null hypotheses T S m 1<br />
Total m – R R m 0<br />
• Family-wise error rate: P(V > 0), i.e., the probability of one or more false<br />
positives. For large m 0 , this is very difficult to keep small.<br />
• False discovery rate (FDR): let Q = V/R, or 0 if R is 0. The FDR is E(Q), i.e.,<br />
the expected fraction of false positives among all discoveries.<br />
Slide 6
A nice property of continuous random variables<br />
• For a continuous random<br />
variable X, P(X = x) is 0 <strong>for</strong> any<br />
x.<br />
Continuous<br />
Gaussian<br />
Exponential<br />
Discrete<br />
Binomial<br />
Geometric<br />
• The distribution function:<br />
Standard normal distribution function<br />
F(x) = P(X x).<br />
• The nice property: if we define<br />
random U = F(X), then U is<br />
uni<strong>for</strong>mly distributed on the unit<br />
interval [0,1].<br />
P(X x)<br />
0.0 0.2 0.4 0.6 0.8 1.0<br />
-4 -2 0 2 4<br />
x<br />
Slide 7
A nice property of CDFs <strong>for</strong> continuous RVs<br />
> X = rnorm(100000)<br />
> F = pnorm<br />
> hist(X, breaks = 50)<br />
> hist(F(X), breaks = 50)<br />
Histogram of X<br />
Histogram of F(X)<br />
Frequency<br />
0 2000 4000 6000 8000<br />
Frequency<br />
0 500 1000 1500 2000<br />
-4 -2 0 2 4<br />
X<br />
0.0 0.2 0.4 0.6 0.8 1.0<br />
F(X)<br />
Slide 8
A “nice” property?<br />
• To compute a p-value <strong>for</strong> testing a null hypothesis H 0 , we typically…<br />
• Define a test statistic T, and compute its value t <strong>for</strong> the observed data.<br />
• Assume we know the distribution of T when H 0 is true: F 0 .<br />
• Compute p = 1 – F 0 (t), i.e., define p = P(T > t | H 0 is true).<br />
• Compare p to some .<br />
• Now define the random variable P = 1 – F 0 (T). If H 0 is true, then…<br />
• F 0 (T) is uni<strong>for</strong>mly distribution on [0,1].<br />
• By symmetry, P is uni<strong>for</strong>mly distribution on [0,1] as well.<br />
• Suppose 20% of genes are differentially expressed, so that<br />
0<br />
= m 0<br />
m<br />
= .80 ...<br />
Slide 9
Observed p-values: a mixture<br />
A.<br />
F 0<br />
C.<br />
Mixture density 0 F 0 + 1 F ( 0 = 0.8)<br />
0 10<br />
0 10<br />
0.0 0.2 0.4 0.6 0.8 1.0<br />
B.<br />
F<br />
Density<br />
0 0<br />
3<br />
II<br />
III<br />
I<br />
IV<br />
False negative<br />
True positive<br />
False positive<br />
True negative<br />
0.0 0.2 0.4 0.6 0.8 1.0<br />
0.0 0.2 0.4 0.6 0.8 1.0<br />
<br />
p<br />
Slide 10
Observed p-values: a mixture<br />
A.<br />
F 0<br />
C.<br />
Mixture density 0 F 0 + 1 F ( 0 = 0.8)<br />
0 10<br />
0 10<br />
0.0 0.2 0.4 0.6 0.8 1.0<br />
B.<br />
F<br />
Density<br />
0 0<br />
3<br />
0.0 0.2 0.4 0.6 0.8 1.0<br />
0.0 0.2 0.4 0.6 0.8 1.0<br />
<br />
p<br />
Slide 11
<strong>Non</strong>-<strong>specific</strong> <strong>filtering</strong><br />
<br />
Slide 12
Continuing with the Chiaretti et al. ALL data<br />
• 79 subjects with B-cell ALL:<br />
• 37 with the BCR/ABL mutation (“Philadelphia chromosome”)<br />
• 42 with no observed cytogenetic abnormalities<br />
• We’ve seen that…<br />
• Multiple testing correction becomes more extreme <strong>for</strong> larger m.<br />
• Removal of true null hypotheses reduces the FDR associated with a particular p-<br />
value cutoff.<br />
• Proposal: non-<strong>specific</strong> <strong>filtering</strong><br />
• von Heydebreck, Huber and Gentleman, Encyclopedia of Genetics, Genomics,<br />
Proteomics and Bioin<strong>for</strong>matics. Wiley, 2005.<br />
• McClintick and Edenberg, BMC Bioin<strong>for</strong>matics 7:49, 2006.<br />
Slide 13
<strong>Non</strong>-<strong>specific</strong> <strong>filtering</strong><br />
• For a given gene, write the data as ((c 1 ,Y 1 ),…,(c p ,Y p )).<br />
• First group (c = 1): i = 1, …, p 1 .<br />
• First group (c = 2): i = p 1 + 1, …, p 1 + p 2 .<br />
• Conditions under which we expect little variation in Y:<br />
1. Genes which are absent in both samples. (Probes will still report noise and crosshybridization,<br />
typically at the same level in both groups.)<br />
2. Probe sets which do not respond to target.<br />
3. Genes which are not differentially expressed.<br />
• A “non-<strong>specific</strong>” filter:<br />
• Ignores c 1 , …, c p , i.e., U (I) (Y).<br />
• Helps identify any of these three classes, based on our a priori understanding of<br />
array behavior.<br />
• Apply standard testing to genes passing the filter, using some U (II) (c,Y).<br />
Slide 14
Increased rejection rate<br />
• Stage one non-<strong>specific</strong> filter statistic: compute the overall variance<br />
U (I) (Y) = S 2 = 1<br />
and remove the smallest.<br />
• Stage two: standard two-sample t-test <strong>for</strong> genes passing stage 1.<br />
p1<br />
p<br />
(Y i<br />
Y ) 2<br />
i=1<br />
a<br />
b<br />
R<br />
0 500 1000 1500<br />
= 0.5<br />
= 0.4<br />
= 0.3<br />
= 0.2<br />
= 0.1<br />
= 0<br />
R<br />
0 500 1000 1500<br />
0.00 0.05 0.10 0.15 0.20 0.25 0.30<br />
FDR (BH)<br />
0.00 0.05 0.10 0.15 0.20 0.25 0.30<br />
qvalue<br />
Slide 15
Increased power?<br />
• An increased detection rate implies increased power only if we are still<br />
controlling type I errors at the nominal level.<br />
a<br />
b<br />
R<br />
0 500 1000 1500<br />
= 0.5<br />
= 0.4<br />
= 0.3<br />
= 0.2<br />
= 0.1<br />
= 0<br />
R<br />
0 500 1000 1500<br />
0.00 0.05 0.10 0.15 0.20 0.25 0.30<br />
FDR (BH)<br />
0.00 0.05 0.10 0.15 0.20 0.25 0.30<br />
qvalue<br />
Slide 16
Multiple testing II: impact of <strong>filtering</strong><br />
<br />
Slide 17
Slide 18<br />
Notation ahead!
Conditional control<br />
• Random variable definitions:<br />
• V: number of false positives. FWER = P(V > 0).<br />
• R: total number of rejections.<br />
• Q: V / max{R, 1}. FDR = E(Q). Note that R = 0 implies that V = 0.<br />
• M: the random set (of size M) of hypothesis passing the stage-one filter.<br />
• U (I) and U (II) : the stage-one and stage-two test statistics.<br />
• Conditional control is sufficient: E(Q) = E(E(Q|M)).<br />
• Because we only reject at stage two, given M, we can assess conditional<br />
control of type I error by considering how a procedure per<strong>for</strong>ms when applied<br />
to the M conditionally distributed (U (II) I1<br />
,…,U (II) IM<br />
)| .<br />
M={I1 ,…,I M<br />
}<br />
• Thus, FWER or FDR control is achieved if the conditional distributions of the<br />
stage-two statistics meet requirements <strong>for</strong> the control procedure.<br />
Slide 19
Requirements <strong>for</strong> FWER and FDR control<br />
• Marginal properties of true-null test statistics.<br />
• Distributions must be properly specified.<br />
• Joint properties of all test-statistics.<br />
• Adjustment procedures may…<br />
…work <strong>for</strong> arbitrary dependence structure (e.g., Bonferroni and Holm).<br />
…estimate and correct <strong>for</strong> dependence (e.g., Westfall and Young).<br />
…require that dependence structure satisfies some assumptions.<br />
• Assumptions about dependence structure:<br />
• Independence!<br />
• Subset pivotality.<br />
• Positive regression dependence.<br />
• Convergence of the empirical processes based on V() and R().<br />
• Etc.<br />
Slide 20
Independence of stage one and stage two test statistics<br />
• For genes <strong>for</strong> which the null hypotheses is true, U (I) and U (II) are statistically<br />
independent in both of the following cases:<br />
• For normally distributed data:<br />
• Stage one: overall mean, , or variance, S 2 = 1 p<br />
Y = 1 p<br />
p1 (Y i<br />
Y ) 2 .<br />
p Y<br />
i=1 i<br />
i=1<br />
• Stage two: the standard two-sample t-statistic, or any location- and scaleinvariant<br />
test statistic.<br />
• <strong>Non</strong>-parametrically:<br />
• Stage one: any function of the data which (i) to filter gene g only uses data<br />
from gene g, and (ii) doesn’t depend on the order of the arguments. S 2 above,<br />
or the IQR, are both candidates.<br />
• Stage two: the Wilcoxon rank sum test statistic.<br />
• Both can be extended to the multi-class context: ANOVA and Kruskal Wallis.<br />
Slide 21
Independence: Bonferroni and Holm FWER adjustments<br />
• Single-stage Bonferroni correction compares each p i to /m.<br />
• The Holm step-down procedure is a more powerful variant. For ordered p-<br />
values p (1) p (2) … p (m) , reject as long as<br />
• Independence of U (I) and U (II) implies that the key step in proving that these<br />
procedures control FWER still applies:<br />
( P(V > 0) = P {U (I) > a,U (II) )<br />
i0 i i<br />
> b}<br />
<br />
p (i )<br />
<<br />
i0<br />
i0<br />
P(U (I)<br />
i<br />
> a,U i (II) > b)<br />
= P(U (I)<br />
i<br />
> a)P(P i<br />
< )<br />
= E(M).<br />
<br />
m i + 1 .<br />
Slide 22
FWER: Westfall and Young<br />
• Westfall and Young (1993) controls FWER with more power, but depends on<br />
the joint distribution of all p-values:<br />
( ) .<br />
p i<br />
= P min<br />
1 j m P j<br />
p i<br />
| H 0<br />
C<br />
• WY93 is valid under subset pivotality. If this holds <strong>for</strong> the one-stage<br />
procedure, it holds <strong>for</strong> the two-stage non-<strong>specific</strong> <strong>filtering</strong> approach as well.<br />
H 0<br />
C<br />
• Distribution of min P j under is typically estimated by permutation. If <strong>filtering</strong><br />
changes correlation structure, new structure is used by permutation!<br />
Slide 23
FDR: Benjamini & Hochberg and Storey adjustments<br />
• What is the FDR associated with<br />
use of cutoff ? Naive estimator:<br />
• V is not observable, but E(V) is<br />
m 0 , which is bounded by m .<br />
• E(R) cannot be computed, but<br />
R can be used as an estimator.<br />
FDR<br />
() =<br />
• Evaluating at each p (i) using m<br />
or ˆm 0<br />
gives BH95 or Storey<br />
adjustments, respectively:<br />
FDR<br />
<br />
(p(i<br />
)<br />
) =<br />
m 0<br />
<br />
#{i : p i<br />
} <br />
m<br />
#{i : p i<br />
}<br />
m 0<br />
<br />
#{i : p i<br />
p (i )<br />
} = m 0<br />
.<br />
i<br />
density<br />
0 0<br />
3<br />
II I<br />
III IV<br />
FDR() = E<br />
V()<br />
R() 1 .<br />
False negative<br />
True positive<br />
False positive<br />
True negative<br />
0.0 0.2 0.4 0.6 0.8 1.0<br />
<br />
p<br />
Slide 24
FDR: Benjamini & Hochberg and Storey adjustments<br />
• The <strong>for</strong>egoing motivation <strong>for</strong> the<br />
BH95 and Storey procedures<br />
uses E(V()) = m 0 .<br />
FDR() = E<br />
V()<br />
R() 1 .<br />
• Marginal independence of true<br />
null U (I) and U (II) means that this<br />
still applies at stage two in<br />
expectation. Define M 0 to be the<br />
random number of true nulls<br />
passing stage 1. Then<br />
E(V()) = P(U (I) i<br />
> a, P<br />
i i<br />
< )<br />
0<br />
= P(U (I) i<br />
> a)P( P<br />
i i<br />
< )<br />
0<br />
= E(M 0<br />
).<br />
density<br />
0 0<br />
3<br />
II I<br />
III IV<br />
False negative<br />
True positive<br />
False positive<br />
True negative<br />
0.0 0.2 0.4 0.6 0.8 1.0<br />
<br />
p<br />
Slide 25
Counterexamples<br />
<br />
Slide 26
Counterexample #1: normality matters<br />
• Filter on S 2 , test using T.<br />
• For true null hypotheses, data<br />
are i.i.d. normal with probability<br />
p, but heavily skewed with<br />
probability 1 – p. Let latent X<br />
indicate mixture component<br />
identity.<br />
Frequency<br />
0 100 200 300 400<br />
^ | X = 0<br />
0.0 0.5 1.0 1.5 2.0 2.5 3.0<br />
^ | X = 1<br />
Frequency<br />
0 50 100 150 200 250<br />
T(Y) | X = 0<br />
4 2 0 2 4<br />
T(Y) | X = 1<br />
• The filter statistic is now<br />
predictive <strong>for</strong> X.<br />
• The conditional distribution of T<br />
will be more weighted towards<br />
the X = 1 case than the<br />
unconditional distribution.<br />
Frequency<br />
0 100 200 300 400<br />
0.0 0.5 1.0 1.5 2.0 2.5 3.0<br />
Frequency<br />
0 50 100 150 200 250<br />
4 2 0 2 4<br />
Slide 27
Counterexample #2: the limma t-statistic<br />
• Filter on S 2 , test using the limma<br />
moderated t-statistic.<br />
• The moderated t-statistic is not<br />
scale invariant, due to the effect<br />
of the global variance estimator.<br />
• This is more pronounced <strong>for</strong><br />
small n — precisely the context<br />
in which limma is most useful.<br />
• Filtering on overall mean rather<br />
than variance “solves” the<br />
problem.<br />
|t|<br />
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5<br />
Moderated t and overall variance<br />
0.5 1.0 1.5 2.0<br />
S<br />
Slide 28
Counterexample #2: the limma t-statistic<br />
Unconditional pvalues<br />
Conditional pvalues<br />
Frequency<br />
0 10 20 30 40 50 60<br />
Frequency<br />
0 10 20 30 40 50<br />
0.0 0.2 0.4 0.6 0.8 1.0<br />
fit$p.value[, 2]<br />
0.0 0.2 0.4 0.6 0.8 1.0<br />
fit$p.value[S > median(S), 2]<br />
Slide 29
Conclusions<br />
• In actual examples, use of an independent filter leads to (biologically)<br />
significant increases in the number of genes identified.<br />
• Some commonly used stage one/stage two test statistic pairs are statistically<br />
independent <strong>for</strong> genes which are not differentially expressed…<br />
• …but others are not! The non-<strong>specific</strong> criterion is not enough.<br />
• Given this independence, Bonferroni and Holm FWER control is valid in the<br />
two-stage procedure. Likewise <strong>for</strong> FDR-controlling procedures which make no<br />
assumptions about independence.<br />
• Correlation structure may change under <strong>filtering</strong>. Permutation-based Westfall<br />
and Young correction accounts <strong>for</strong> this. FDR control could theoretically suffer,<br />
although the practical impact is likely to be small.<br />
• Effect of <strong>filtering</strong> on correlation can also be checked, and impact, assessed.<br />
Slide 30