29.07.2014 Views

On the Analysis of Optical Mapping Data - University of Wisconsin ...

On the Analysis of Optical Mapping Data - University of Wisconsin ...

On the Analysis of Optical Mapping Data - University of Wisconsin ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

39<br />

using optical mapping data from a human genome. Finally, we consider <strong>the</strong> implications <strong>of</strong><br />

<strong>the</strong> proposed approach in Section 3.4. Among o<strong>the</strong>r things, we develop conditional significance<br />

tests with control over error rates, outline a scheme to compare various score functions<br />

and provide explanations for several empirical observations.<br />

3.2 Methods<br />

Alignment problems are primarily <strong>of</strong> two kinds; first, where an optical map is aligned<br />

against ano<strong>the</strong>r optical map, and second, where an optical map is aligned against an errorfree<br />

reference map, typically derived in silico from sequence. The first kind might be a<br />

component in assembly algorithms. The second kind can be used to divide a large assembly<br />

problem, which are at least <strong>of</strong> quadratic complexity, into smaller tractable ones (see Section<br />

1.3.5). Here, we restrict our attention to alignment problems <strong>of</strong> <strong>the</strong> second kind. Such<br />

alignments are also used in Chapter 4 to investigate copy number alterations.<br />

Formulation: Denote an optical map by M and <strong>the</strong> in silico reference map by ˜G. Consider<br />

scores for all possible alignments <strong>of</strong> M to ˜G and choose <strong>the</strong> alignment for which <strong>the</strong> score is<br />

maximized. Denote <strong>the</strong> corresponding optimal score by S. We are interested in determining<br />

whe<strong>the</strong>r this optimal alignment is statistically significant. It is convenient to phrase this<br />

question as a test <strong>of</strong> independence. Specifically, it is assumed that M and ˜G are random<br />

quantities generated by some stochastic mechanism, and <strong>the</strong> statistic S is used to test <strong>the</strong><br />

null hypo<strong>the</strong>sis H 0 : M ⊥ ˜G. The distribution <strong>of</strong> S under H 0 is determined by <strong>the</strong> marginal<br />

distributions <strong>of</strong> M and ˜G.<br />

Conditional permutation test: We avoid specification <strong>of</strong> <strong>the</strong> distribution <strong>of</strong> M, which<br />

is complex, by conditioning on M. Let P G denote <strong>the</strong> marginal distribution <strong>of</strong> ˜G. For any<br />

random G ∼ P G , let S(G|M) be <strong>the</strong> optimal score obtained by aligning <strong>the</strong> fixed M against<br />

G. P G induces a distribution <strong>of</strong> <strong>the</strong> scalar random variable S(G|M), represented by <strong>the</strong><br />

)<br />

corresponding CDF, denoted F 0 (·|M). Using <strong>the</strong> observed optimal score S = S<br />

(˜G|M , H 0

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!