On the Analysis of Optical Mapping Data - University of Wisconsin ...
On the Analysis of Optical Mapping Data - University of Wisconsin ...
On the Analysis of Optical Mapping Data - University of Wisconsin ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
57<br />
especially a short noisy one, to originate from somewhere in <strong>the</strong> reference but have its<br />
optimal alignment somewhere else. The null hypo<strong>the</strong>sis <strong>of</strong> independence is not true in such<br />
a case, yet we would not want to declare <strong>the</strong> optimal alignment significant. Thus, it may be<br />
reasonable to define <strong>the</strong> best spurious score <strong>of</strong> M against ˜G as <strong>the</strong> maximum score among<br />
alignments that are not <strong>the</strong> true alignment. This is <strong>of</strong> course not observable, since we<br />
have no way <strong>of</strong> knowing <strong>the</strong> true alignment, or even whe<strong>the</strong>r it exists at all. There are<br />
o<strong>the</strong>r problems with this definition; e.g. what makes an alignment sufficiently different from<br />
<strong>the</strong> true alignment? Should alignments to incorrect but homologous regions be considered<br />
spurious? By formulating <strong>the</strong> problem as a test <strong>of</strong> independence, <strong>the</strong>se issues are avoided.<br />
O<strong>the</strong>r methods: Valouev et al. (2006) suggest an approach to determine significance that<br />
is similar to ours in principle, but is completely model-based. They postulate that <strong>the</strong><br />
fragment lengths in <strong>the</strong> reference genome ˜G are i.i.d. exponential variates, and describe a<br />
conditional model for optical maps given <strong>the</strong> reference. These are <strong>the</strong>n used to formally derive<br />
<strong>the</strong> marginal distribution <strong>of</strong> optical maps, which reduces to an i.i.d. exponential distribution<br />
for <strong>the</strong> optical map fragment lengths, but with a different rate. Cut<strong>of</strong>fs are obtained by<br />
simulating both reference and optical maps under <strong>the</strong> null hypo<strong>the</strong>sis <strong>of</strong> independence. This<br />
is a perfectly valid approach, but may be sensitive to parameter estimates as well as model<br />
misspecification, which is a legitimate concern since <strong>the</strong>ir conditional model excludes certain<br />
known sources <strong>of</strong> noise, namely desorption and scaling (see Chapter 2). Our conditional<br />
non-parametric approach bypasses <strong>the</strong>se concerns.<br />
Direct approach vs regression: Estimating <strong>the</strong> mean spurious score µ(M) separately<br />
for each map is usually feasible and more powerful than regression. However, for alignments<br />
involving only part <strong>of</strong> an optical map, a cut<strong>of</strong>f based on <strong>the</strong> full map is not appropriate. This<br />
is a concern particularly for overlap matches, where alignments overhanging at <strong>the</strong> boundary<br />
<strong>of</strong> <strong>the</strong> reference map are allowed. The regression approach can still be used in such cases by<br />
considering only <strong>the</strong> aligned portion <strong>of</strong> <strong>the</strong> map. The regression on N and L as used above<br />
is <strong>of</strong> course not <strong>the</strong> only possible model, but Table 3.1 suggests that it explains most <strong>of</strong> <strong>the</strong>