On the Analysis of Optical Mapping Data - University of Wisconsin ...
On the Analysis of Optical Mapping Data - University of Wisconsin ...
On the Analysis of Optical Mapping Data - University of Wisconsin ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
51<br />
The ability to simulate from <strong>the</strong> null distribution allows us to try out and choose from<br />
among various sets <strong>of</strong> parameters. Note that an appropriate choice may depend on <strong>the</strong> task;<br />
for example, <strong>the</strong> best score for gapped alignment is always larger than that for ungapped<br />
alignment, so <strong>the</strong> same set <strong>of</strong> parameters may not be optimal for both. Indeed, it is tempting<br />
to try and ‘improve’ <strong>the</strong> scores we have used in our examples and consider ones o<strong>the</strong>r than<br />
those reported in Table 3.2; however, we will refrain from doing so since a proper study<br />
requires a systematic effort that is beyond <strong>the</strong> scope <strong>of</strong> this discussion.<br />
3.4.2 Information measure<br />
Location-specific cut<strong>of</strong>f: It is empirically known that different cut<strong>of</strong>fs for <strong>the</strong> SOMA<br />
score seem appropriate for alignments to different parts <strong>of</strong> a reference map, but a formal approach<br />
incorporating this idea has been difficult to formulate. Map-specific cut<strong>of</strong>fs provide<br />
a perfectly natural explanation for this observation, since an optical map is largely determined<br />
by its origin. However, this does not guard against spurious alignments at similar<br />
(homologous) regions in <strong>the</strong> genome, which are also a potential concern.<br />
Information measure: A related construct that proves useful in fur<strong>the</strong>r understanding<br />
optical map score functions is <strong>the</strong> score obtained by aligning a map with itself, which we<br />
henceforth denote by ψ(M). Given a score function, this can be thought <strong>of</strong> as an information<br />
measure for <strong>the</strong> map: if <strong>the</strong> map had no errors, this would be <strong>the</strong> score for <strong>the</strong> correct alignment.<br />
Errors normally reduce <strong>the</strong> correct alignment score from this perfect score. ψ(M) is <strong>of</strong><br />
course higher for longer maps, but is also affected by <strong>the</strong> lengths <strong>of</strong> <strong>the</strong> component fragments<br />
since most score functions reward matches involving longer fragments, which are rarer. Maps<br />
with lower information content are naturally harder to align successfully. However, Figure<br />
3.8 shows that even for maps with high information content, <strong>the</strong> distributions <strong>of</strong> spurious<br />
and real SOMA scores are not well separated.<br />
Simulation: In general, any optical map dataset and score function can be summarized<br />
by a plot analogous to Figure 3.8. Figure 3.9 shows such a plot for a set <strong>of</strong> simulated optical