29.07.2014 Views

On the Analysis of Optical Mapping Data - University of Wisconsin ...

On the Analysis of Optical Mapping Data - University of Wisconsin ...

On the Analysis of Optical Mapping Data - University of Wisconsin ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

48<br />

Constant (4.5 / 10)<br />

Constant (2.75 / 10)<br />

Regression (99.9 %)<br />

Regression (99.0 %)<br />

14000<br />

Number Input<br />

2<br />

10000<br />

Number in assembly<br />

2<br />

0.95<br />

Ratio<br />

2<br />

12000<br />

9000<br />

0.90<br />

0.85<br />

10000<br />

8000<br />

0.80<br />

8000<br />

7000<br />

0.75<br />

0.70<br />

Bases Covered (Mb)<br />

2<br />

Number <strong>of</strong> Gaps<br />

2<br />

Proportion Unaligned<br />

2<br />

232<br />

30<br />

0.012<br />

0.010<br />

230<br />

25<br />

0.008<br />

228<br />

20<br />

0.006<br />

0.004<br />

226<br />

15<br />

1 2 3<br />

1 2 3<br />

Iteration<br />

1 2 3<br />

Figure 3.6 Comparison <strong>of</strong> various significance strategies through <strong>the</strong> iterative assembly<br />

procedure <strong>of</strong> Reslewic et al., where pairwise alignment is used as a filtering step before assembly.<br />

Chromosome 2 is assembled using <strong>the</strong> CHM data and <strong>the</strong> SOMA score. Two versions<br />

<strong>of</strong> <strong>the</strong> regression-based cut<strong>of</strong>f, with nominal specificities <strong>of</strong> 99.9% and 99.0%, are compared<br />

with a previously used scheme <strong>of</strong> declaring significance when <strong>the</strong> alignment has more than<br />

10 aligned restriction sites and score above 4.5. To investigate whe<strong>the</strong>r performance is only<br />

affected by <strong>the</strong> number <strong>of</strong> maps allowed in by <strong>the</strong> filter, a similar scheme with <strong>the</strong> constant<br />

cut<strong>of</strong>f lowered to 2.75 is also used, where <strong>the</strong> cut<strong>of</strong>f is selected to allow roughly <strong>the</strong> same<br />

number <strong>of</strong> maps in <strong>the</strong> first step as <strong>the</strong> 99.0% regression cut<strong>of</strong>f. To allow partial alignments<br />

at <strong>the</strong> boundary <strong>of</strong> <strong>the</strong> reference, “aligned length” and “count” are used as surrogates for<br />

length and number <strong>of</strong> fragments, which effectively make <strong>the</strong> regression cut<strong>of</strong>fs more conservative<br />

than <strong>the</strong>ir nominal specificities would suggest. The first row reports <strong>the</strong> number <strong>of</strong><br />

maps fed into <strong>the</strong> assembly step and <strong>the</strong> number (and proportion) <strong>of</strong> <strong>the</strong>se assembled into<br />

contigs. In <strong>the</strong> second row, we attempt to assess <strong>the</strong> quality <strong>of</strong> <strong>the</strong> assembly by aligning <strong>the</strong><br />

consensus contigs to <strong>the</strong> original in silico reference. The first two panels graph <strong>the</strong> number<br />

<strong>of</strong> bases in <strong>the</strong> reference covered and <strong>the</strong> numbers <strong>of</strong> gaps. The third panel shows a crude<br />

measure <strong>of</strong> <strong>the</strong> false positive rate, namely <strong>the</strong> proportion <strong>of</strong> bases in <strong>the</strong> assembled contigs<br />

that do not align to <strong>the</strong> reference.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!