On the Analysis of Optical Mapping Data - University of Wisconsin ...
On the Analysis of Optical Mapping Data - University of Wisconsin ...
On the Analysis of Optical Mapping Data - University of Wisconsin ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
48<br />
Constant (4.5 / 10)<br />
Constant (2.75 / 10)<br />
Regression (99.9 %)<br />
Regression (99.0 %)<br />
14000<br />
Number Input<br />
2<br />
10000<br />
Number in assembly<br />
2<br />
0.95<br />
Ratio<br />
2<br />
12000<br />
9000<br />
0.90<br />
0.85<br />
10000<br />
8000<br />
0.80<br />
8000<br />
7000<br />
0.75<br />
0.70<br />
Bases Covered (Mb)<br />
2<br />
Number <strong>of</strong> Gaps<br />
2<br />
Proportion Unaligned<br />
2<br />
232<br />
30<br />
0.012<br />
0.010<br />
230<br />
25<br />
0.008<br />
228<br />
20<br />
0.006<br />
0.004<br />
226<br />
15<br />
1 2 3<br />
1 2 3<br />
Iteration<br />
1 2 3<br />
Figure 3.6 Comparison <strong>of</strong> various significance strategies through <strong>the</strong> iterative assembly<br />
procedure <strong>of</strong> Reslewic et al., where pairwise alignment is used as a filtering step before assembly.<br />
Chromosome 2 is assembled using <strong>the</strong> CHM data and <strong>the</strong> SOMA score. Two versions<br />
<strong>of</strong> <strong>the</strong> regression-based cut<strong>of</strong>f, with nominal specificities <strong>of</strong> 99.9% and 99.0%, are compared<br />
with a previously used scheme <strong>of</strong> declaring significance when <strong>the</strong> alignment has more than<br />
10 aligned restriction sites and score above 4.5. To investigate whe<strong>the</strong>r performance is only<br />
affected by <strong>the</strong> number <strong>of</strong> maps allowed in by <strong>the</strong> filter, a similar scheme with <strong>the</strong> constant<br />
cut<strong>of</strong>f lowered to 2.75 is also used, where <strong>the</strong> cut<strong>of</strong>f is selected to allow roughly <strong>the</strong> same<br />
number <strong>of</strong> maps in <strong>the</strong> first step as <strong>the</strong> 99.0% regression cut<strong>of</strong>f. To allow partial alignments<br />
at <strong>the</strong> boundary <strong>of</strong> <strong>the</strong> reference, “aligned length” and “count” are used as surrogates for<br />
length and number <strong>of</strong> fragments, which effectively make <strong>the</strong> regression cut<strong>of</strong>fs more conservative<br />
than <strong>the</strong>ir nominal specificities would suggest. The first row reports <strong>the</strong> number <strong>of</strong><br />
maps fed into <strong>the</strong> assembly step and <strong>the</strong> number (and proportion) <strong>of</strong> <strong>the</strong>se assembled into<br />
contigs. In <strong>the</strong> second row, we attempt to assess <strong>the</strong> quality <strong>of</strong> <strong>the</strong> assembly by aligning <strong>the</strong><br />
consensus contigs to <strong>the</strong> original in silico reference. The first two panels graph <strong>the</strong> number<br />
<strong>of</strong> bases in <strong>the</strong> reference covered and <strong>the</strong> numbers <strong>of</strong> gaps. The third panel shows a crude<br />
measure <strong>of</strong> <strong>the</strong> false positive rate, namely <strong>the</strong> proportion <strong>of</strong> bases in <strong>the</strong> assembled contigs<br />
that do not align to <strong>the</strong> reference.