29.07.2014 Views

On the Analysis of Optical Mapping Data - University of Wisconsin ...

On the Analysis of Optical Mapping Data - University of Wisconsin ...

On the Analysis of Optical Mapping Data - University of Wisconsin ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

38<br />

may appear where <strong>the</strong>re should have been none and small fragments may not be represented<br />

because <strong>the</strong>y float away or merge with neighboring fragments. See Chapter 1 for a detailed<br />

overview <strong>of</strong> <strong>the</strong> optical mapping system and Chapter 2 for more on <strong>the</strong> inherent errors and<br />

statistical features <strong>of</strong> optical map data.<br />

Alignment: A fundamental computational problem in optical mapping is alignment, i.e.,<br />

given an optical map, trying to identify whe<strong>the</strong>r it overlaps with o<strong>the</strong>r restriction maps, and if<br />

so, where. Alignments are not particularly valuable individually, but used en masse <strong>the</strong>y are<br />

important components in many procedures. Dynamic Programming (DP) algorithms have<br />

been used extensively in DNA and protein sequence alignment (Durbin et al., 1998), and can<br />

be used to align restriction maps with suitable modifications (Huang and Waterman, 1992).<br />

Dynamic programming is a generic approach to alignment, and its usefulness depends on <strong>the</strong><br />

details <strong>of</strong> how it is applied. There are two important components in such alignment schemes.<br />

The first is a score function, which is <strong>the</strong> objective function that <strong>the</strong> algorithm maximizes<br />

(see Appendix A). The second is <strong>the</strong> strategy for detecting significance, i.e., whe<strong>the</strong>r or not<br />

<strong>the</strong> alignment with <strong>the</strong> optimum score, which exists even if <strong>the</strong>re is no true alignment, should<br />

be considered a real alignment as opposed to a spurious one. The nature <strong>of</strong> optical mapping<br />

data makes this problem harder than for sequence alignment.<br />

Significance: Prior to <strong>the</strong> present work, <strong>the</strong> detection <strong>of</strong> significance in optical map alignments<br />

has not been systematically studied. Conceptually, <strong>the</strong> problem is a test for <strong>the</strong> null<br />

hypo<strong>the</strong>sis that <strong>the</strong> maps being aligned are independent, with <strong>the</strong> optimal score as <strong>the</strong> test<br />

statistic. Unfortunately, <strong>the</strong> null distribution, i.e. <strong>the</strong> distribution <strong>of</strong> <strong>the</strong> optimal score under<br />

independence, is not easy to obtain. Rules based on simulated optical maps are possible;<br />

however, <strong>the</strong>y are predicated on <strong>the</strong> accuracy <strong>of</strong> <strong>the</strong> simulation model, which may not truly<br />

reflect all <strong>the</strong> complexities <strong>of</strong> optical mapping. Our main contribution, as described in Section<br />

3.2, is to phrase <strong>the</strong> significance problem in a way that allows us to naturally sample<br />

from <strong>the</strong> null distribution <strong>of</strong> optimal scores avoiding any explicit model for optical maps. In<br />

Section 3.3, this framework is used to investigate <strong>the</strong> properties <strong>of</strong> a particular score function

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!