On the Analysis of Optical Mapping Data - University of Wisconsin ...
On the Analysis of Optical Mapping Data - University of Wisconsin ...
On the Analysis of Optical Mapping Data - University of Wisconsin ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
38<br />
may appear where <strong>the</strong>re should have been none and small fragments may not be represented<br />
because <strong>the</strong>y float away or merge with neighboring fragments. See Chapter 1 for a detailed<br />
overview <strong>of</strong> <strong>the</strong> optical mapping system and Chapter 2 for more on <strong>the</strong> inherent errors and<br />
statistical features <strong>of</strong> optical map data.<br />
Alignment: A fundamental computational problem in optical mapping is alignment, i.e.,<br />
given an optical map, trying to identify whe<strong>the</strong>r it overlaps with o<strong>the</strong>r restriction maps, and if<br />
so, where. Alignments are not particularly valuable individually, but used en masse <strong>the</strong>y are<br />
important components in many procedures. Dynamic Programming (DP) algorithms have<br />
been used extensively in DNA and protein sequence alignment (Durbin et al., 1998), and can<br />
be used to align restriction maps with suitable modifications (Huang and Waterman, 1992).<br />
Dynamic programming is a generic approach to alignment, and its usefulness depends on <strong>the</strong><br />
details <strong>of</strong> how it is applied. There are two important components in such alignment schemes.<br />
The first is a score function, which is <strong>the</strong> objective function that <strong>the</strong> algorithm maximizes<br />
(see Appendix A). The second is <strong>the</strong> strategy for detecting significance, i.e., whe<strong>the</strong>r or not<br />
<strong>the</strong> alignment with <strong>the</strong> optimum score, which exists even if <strong>the</strong>re is no true alignment, should<br />
be considered a real alignment as opposed to a spurious one. The nature <strong>of</strong> optical mapping<br />
data makes this problem harder than for sequence alignment.<br />
Significance: Prior to <strong>the</strong> present work, <strong>the</strong> detection <strong>of</strong> significance in optical map alignments<br />
has not been systematically studied. Conceptually, <strong>the</strong> problem is a test for <strong>the</strong> null<br />
hypo<strong>the</strong>sis that <strong>the</strong> maps being aligned are independent, with <strong>the</strong> optimal score as <strong>the</strong> test<br />
statistic. Unfortunately, <strong>the</strong> null distribution, i.e. <strong>the</strong> distribution <strong>of</strong> <strong>the</strong> optimal score under<br />
independence, is not easy to obtain. Rules based on simulated optical maps are possible;<br />
however, <strong>the</strong>y are predicated on <strong>the</strong> accuracy <strong>of</strong> <strong>the</strong> simulation model, which may not truly<br />
reflect all <strong>the</strong> complexities <strong>of</strong> optical mapping. Our main contribution, as described in Section<br />
3.2, is to phrase <strong>the</strong> significance problem in a way that allows us to naturally sample<br />
from <strong>the</strong> null distribution <strong>of</strong> optimal scores avoiding any explicit model for optical maps. In<br />
Section 3.3, this framework is used to investigate <strong>the</strong> properties <strong>of</strong> a particular score function