On the Analysis of Optical Mapping Data - University of Wisconsin ...
On the Analysis of Optical Mapping Data - University of Wisconsin ...
On the Analysis of Optical Mapping Data - University of Wisconsin ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
39<br />
using optical mapping data from a human genome. Finally, we consider <strong>the</strong> implications <strong>of</strong><br />
<strong>the</strong> proposed approach in Section 3.4. Among o<strong>the</strong>r things, we develop conditional significance<br />
tests with control over error rates, outline a scheme to compare various score functions<br />
and provide explanations for several empirical observations.<br />
3.2 Methods<br />
Alignment problems are primarily <strong>of</strong> two kinds; first, where an optical map is aligned<br />
against ano<strong>the</strong>r optical map, and second, where an optical map is aligned against an errorfree<br />
reference map, typically derived in silico from sequence. The first kind might be a<br />
component in assembly algorithms. The second kind can be used to divide a large assembly<br />
problem, which are at least <strong>of</strong> quadratic complexity, into smaller tractable ones (see Section<br />
1.3.5). Here, we restrict our attention to alignment problems <strong>of</strong> <strong>the</strong> second kind. Such<br />
alignments are also used in Chapter 4 to investigate copy number alterations.<br />
Formulation: Denote an optical map by M and <strong>the</strong> in silico reference map by ˜G. Consider<br />
scores for all possible alignments <strong>of</strong> M to ˜G and choose <strong>the</strong> alignment for which <strong>the</strong> score is<br />
maximized. Denote <strong>the</strong> corresponding optimal score by S. We are interested in determining<br />
whe<strong>the</strong>r this optimal alignment is statistically significant. It is convenient to phrase this<br />
question as a test <strong>of</strong> independence. Specifically, it is assumed that M and ˜G are random<br />
quantities generated by some stochastic mechanism, and <strong>the</strong> statistic S is used to test <strong>the</strong><br />
null hypo<strong>the</strong>sis H 0 : M ⊥ ˜G. The distribution <strong>of</strong> S under H 0 is determined by <strong>the</strong> marginal<br />
distributions <strong>of</strong> M and ˜G.<br />
Conditional permutation test: We avoid specification <strong>of</strong> <strong>the</strong> distribution <strong>of</strong> M, which<br />
is complex, by conditioning on M. Let P G denote <strong>the</strong> marginal distribution <strong>of</strong> ˜G. For any<br />
random G ∼ P G , let S(G|M) be <strong>the</strong> optimal score obtained by aligning <strong>the</strong> fixed M against<br />
G. P G induces a distribution <strong>of</strong> <strong>the</strong> scalar random variable S(G|M), represented by <strong>the</strong><br />
)<br />
corresponding CDF, denoted F 0 (·|M). Using <strong>the</strong> observed optimal score S = S<br />
(˜G|M , H 0