29.07.2014 Views

On the Analysis of Optical Mapping Data - University of Wisconsin ...

On the Analysis of Optical Mapping Data - University of Wisconsin ...

On the Analysis of Optical Mapping Data - University of Wisconsin ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

40<br />

is rejected at level α if S > c α where F 0 (c α |M) = 1 − α. The advantage <strong>of</strong> this formulation<br />

is that given a choice <strong>of</strong> P G , we can in principle simulate from F 0 (·|M) to obtain a suitable<br />

cut<strong>of</strong>f, without requiring any probabilistic model for <strong>the</strong> optical map M. An effective choice<br />

<strong>of</strong> P G is given by random permutations <strong>of</strong> <strong>the</strong> reference ˜G. This preserves characteristics <strong>of</strong><br />

<strong>the</strong> reference that are known to affect <strong>the</strong> spurious score distribution, namely <strong>the</strong> number<br />

and lengths <strong>of</strong> fragments. Permuting <strong>the</strong> order <strong>of</strong> fragments is also reasonable given <strong>the</strong><br />

additive nature <strong>of</strong> score functions, which essentially reward matches in order. Formally, if we<br />

assume that <strong>the</strong> fragment lengths defining G are i.i.d. from some distribution in a family F,<br />

permutation can be viewed as sampling from P G conditional on <strong>the</strong> set <strong>of</strong> fragment lengths in<br />

˜G, which is sufficient for F. Such tests are <strong>of</strong>ten called permutation tests (Cox and Hinkley,<br />

1979, Chapter 6). See Figure 3.1 for a graphical justification <strong>of</strong> <strong>the</strong> i.i.d. assumption.<br />

3.3 Results<br />

3.3.1 Exploration<br />

We use optical map data from GM07535, a diploid normal human lymphoblastoid cell line,<br />

for illustration. The data consists <strong>of</strong> 206796 optical maps longer than 300 Kb. These maps<br />

are aligned against an in silico reference map derived from Build 35 <strong>of</strong> <strong>the</strong> human genome<br />

sequence (International Human Genome Sequencing Consortium, 2004), with sequence gaps<br />

replaced by <strong>the</strong>ir estimated length. We use a score function implemented in <strong>the</strong> SOMA<br />

s<strong>of</strong>tware suite with parameters that have been extensively used with optical map data. The<br />

actual score function, henceforth referred to as <strong>the</strong> SOMA score, is described in Appendix<br />

A. In addition to <strong>the</strong> best alignment scores against <strong>the</strong> in silico reference, we consider best<br />

scores for each map against several independent random permutations <strong>of</strong> <strong>the</strong> reference. The<br />

permutations are done separately for every chromosome, thus retaining <strong>the</strong> total length and<br />

number <strong>of</strong> fragments within each. For <strong>the</strong> most part, we restrict our attention to ungapped<br />

global alignments.<br />

In <strong>the</strong>ory, we can approximate <strong>the</strong> conditional null distribution F 0 (·|M) by sampling from<br />

it an arbitrary number <strong>of</strong> times. In practice, each such sample involves a permutation <strong>of</strong>

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!