24.02.2013 Views

Optimality

Optimality

Optimality

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Massive multiple hypotheses testing 63<br />

The factor α0 serves asymptotically as a calibrator of the adaptive significance<br />

threshold to the Bonferroni threshold in the least favorable scenario π0 = 1, i.e., all<br />

null hypotheses are true. Analysis of the asymptotic ERR of the HT(α ∗ cal ) procedure<br />

suggests a few choices of α0 in practice.<br />

4.2. Asymptotic ERR of HT(α ∗ cal )<br />

Recall from (2.7) that<br />

ERR(α) = � π0α � Fm(α) � Pr(P1:m≤ α).<br />

The probability Pr(P1:m ≤ α) is not tractable in general, but an upper bound<br />

can be obtained under a reasonable assumption on the set Pm of the m P values.<br />

Massive multiple tests are mostly applied in exploratory studies to produce<br />

“inference-guided discoveries” that are either subject to further confirmation and<br />

validation, or helpful for developing new research hypotheses. For this reason often<br />

all the alternative hypotheses are two-sided, and hence so are the tests. It is instructive<br />

to first consider the case of m two-sample t tests. Conceptually the data<br />

consist of n1 i.i.d. observations on R m Xi = [Xi1, Xi2, . . . , Xim], i = 1, . . . , n1 in<br />

the first group, and n2 i.i.d. observations Yi = [Yi1, Yi2, . . . , Yim], i = 1, . . . , n2 in<br />

the second group. The hypothesis pair (H0k, HAk) is tested by the two-sided twosample<br />

t statistic Tk =|T(Xk,Yk, n1, n2)| based on the dataXk ={X1k, . . . , Xn1k}<br />

andYk ={Y1k, . . . , Yn2k}. Often in biological applications that study gene signaling<br />

pathways (see e.g., Kuo et al. [18], and the simulation model in Section<br />

5), Xik and Xik ′ (i = 1, . . . , n1) are either positively or negatively correlated<br />

for certain k �= k ′ , and the same holds for Yik and Yik ′ (i = 1, . . . , n2). Such<br />

dependence in data raises positive association between the two-sided test statis-<br />

tics Tk and Tk ′ so that Pr(Tk ≤ t|T ′ k ≤ t) ≥ Pr(Tk ≤ t), implying Pr(Tk ≤<br />

t, Tk ′ ≤ t)≥Pr(Tk ≤ t)Pr(Tk ′ ≤ t), t≥0. Then the P values in turn satisfy<br />

Pr(Pk > α, Pk ′ > α)≥Pr(Pk > α)Pr(Pk ′ > α), α∈[0,1]. It is straightforward to<br />

generalize this type of dependency to more than two tests. Alternatively, a direct<br />

model for the P values can be constructed.<br />

Example 4.1. LetJ ⊆{1, . . . , m} be a nonempty set of indices. Assume Pj =<br />

P Xj<br />

0 , j∈J , where P0 follows a distribution F0 on [0, 1], and Xj’s are i.i.d. continuous<br />

random variables following a distribution H on [0,∞), and are independent<br />

of the P values. Assume that the Pi’s for i�∈J are either independent or related to<br />

each other in the same fashion. This model mimics the effect of an activated gene<br />

signaling pathway that results in gene differential expression as reflected by the P<br />

values: the setJ represents the genes involved in the pathway, P0 represents the<br />

underlying activation mechanism, and Xj represents the noisy response of gene j<br />

resulting in Pj. Because Pi > α if and only if Xj < log α � log P0, direct calculations<br />

using independence of the Xj’s show that<br />

⎛<br />

Pr⎝<br />

�<br />

⎞ ⎛<br />

� 1<br />

{Pj >α} ⎠= Pr⎝<br />

�<br />

� �<br />

log α<br />

Xj <<br />

log t<br />

⎞ �� � �� �<br />

|J |<br />

⎠dF0(t)=E<br />

log α<br />

H<br />

,<br />

log P0<br />

j∈J<br />

0<br />

j∈J<br />

where|J| is the cardinalityJ . Next<br />

�<br />

Pr(Pj > α) = �<br />

j∈J<br />

j∈J<br />

� 1<br />

0<br />

�<br />

H<br />

� �� � � � ��� |J |<br />

log α<br />

log α<br />

dF0(t) = E H<br />

.<br />

log t<br />

log P0

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!