29.07.2014 Views

On the Analysis of Optical Mapping Data - University of Wisconsin ...

On the Analysis of Optical Mapping Data - University of Wisconsin ...

On the Analysis of Optical Mapping Data - University of Wisconsin ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

51<br />

The ability to simulate from <strong>the</strong> null distribution allows us to try out and choose from<br />

among various sets <strong>of</strong> parameters. Note that an appropriate choice may depend on <strong>the</strong> task;<br />

for example, <strong>the</strong> best score for gapped alignment is always larger than that for ungapped<br />

alignment, so <strong>the</strong> same set <strong>of</strong> parameters may not be optimal for both. Indeed, it is tempting<br />

to try and ‘improve’ <strong>the</strong> scores we have used in our examples and consider ones o<strong>the</strong>r than<br />

those reported in Table 3.2; however, we will refrain from doing so since a proper study<br />

requires a systematic effort that is beyond <strong>the</strong> scope <strong>of</strong> this discussion.<br />

3.4.2 Information measure<br />

Location-specific cut<strong>of</strong>f: It is empirically known that different cut<strong>of</strong>fs for <strong>the</strong> SOMA<br />

score seem appropriate for alignments to different parts <strong>of</strong> a reference map, but a formal approach<br />

incorporating this idea has been difficult to formulate. Map-specific cut<strong>of</strong>fs provide<br />

a perfectly natural explanation for this observation, since an optical map is largely determined<br />

by its origin. However, this does not guard against spurious alignments at similar<br />

(homologous) regions in <strong>the</strong> genome, which are also a potential concern.<br />

Information measure: A related construct that proves useful in fur<strong>the</strong>r understanding<br />

optical map score functions is <strong>the</strong> score obtained by aligning a map with itself, which we<br />

henceforth denote by ψ(M). Given a score function, this can be thought <strong>of</strong> as an information<br />

measure for <strong>the</strong> map: if <strong>the</strong> map had no errors, this would be <strong>the</strong> score for <strong>the</strong> correct alignment.<br />

Errors normally reduce <strong>the</strong> correct alignment score from this perfect score. ψ(M) is <strong>of</strong><br />

course higher for longer maps, but is also affected by <strong>the</strong> lengths <strong>of</strong> <strong>the</strong> component fragments<br />

since most score functions reward matches involving longer fragments, which are rarer. Maps<br />

with lower information content are naturally harder to align successfully. However, Figure<br />

3.8 shows that even for maps with high information content, <strong>the</strong> distributions <strong>of</strong> spurious<br />

and real SOMA scores are not well separated.<br />

Simulation: In general, any optical map dataset and score function can be summarized<br />

by a plot analogous to Figure 3.8. Figure 3.9 shows such a plot for a set <strong>of</strong> simulated optical

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!