On the Analysis of Optical Mapping Data - University of Wisconsin ...
On the Analysis of Optical Mapping Data - University of Wisconsin ...
On the Analysis of Optical Mapping Data - University of Wisconsin ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
47<br />
3.3.3 Simulation<br />
Given a generative model, such as <strong>the</strong> one described in Chapter 2, we can simulate optical<br />
maps and align <strong>the</strong>m using <strong>the</strong> test developed above. Since we know <strong>the</strong> origin <strong>of</strong> <strong>the</strong>se maps,<br />
we can estimate <strong>the</strong> true sensitivity and specificity. Such models are <strong>of</strong> limited use since <strong>the</strong>y<br />
do not capture all <strong>the</strong> sources <strong>of</strong> noise in real data, nor do <strong>the</strong>y reflect <strong>the</strong> fluctuations in<br />
parameter values that occur in <strong>the</strong> course <strong>of</strong> a real experiment. None<strong>the</strong>less, simulation is<br />
a valuable diagnostic tool. We applied <strong>the</strong> regression test derived above with a nominal<br />
specificity <strong>of</strong> 99.9% to 50,000 maps simulated from <strong>the</strong> human reference. 73.42% <strong>of</strong> correct<br />
alignments were declared as significant, 0.39% had at least one spurious alignment declared<br />
to be significant in addition to <strong>the</strong> correct one and 0.27% had only spurious significant<br />
alignments. It should be noted that some incorrect alignments are likely to be legitimate<br />
due to repeat structures in <strong>the</strong> genome.<br />
3.3.4 Improving assembly<br />
Alignments <strong>of</strong> optical maps to a reference is an important component <strong>of</strong> <strong>the</strong> iterative assembly<br />
scheme described in Section 1.3.5. Previously, a constant cut<strong>of</strong>f (determined heuristically)<br />
has been used to assess significance with <strong>the</strong> SOMA score. Figure 3.2 clearly suggests<br />
that this can be improved. By replacing <strong>the</strong> constant cut<strong>of</strong>f with <strong>the</strong> rules derived above and<br />
comparing <strong>the</strong> results, we can get quantitative validation <strong>of</strong> our methods indirectly through<br />
assembly. Figure 3.6 compares <strong>the</strong> results <strong>of</strong> using various significance strategies, including<br />
<strong>the</strong> previously used cut<strong>of</strong>f and two regression cut<strong>of</strong>fs with different nominal specificities.<br />
Chromosome 2 was assembled using <strong>the</strong> CHM data set (Section 1.2), which was not used in<br />
determining <strong>the</strong> significance rules. A simple measure <strong>of</strong> success is <strong>the</strong> contig rate, i.e., <strong>the</strong><br />
proportion <strong>of</strong> maps passing <strong>the</strong> alignment step that are eventually included in <strong>the</strong> assembly.<br />
By this measure, <strong>the</strong> new strategy clearly performs better. It is less clear how to compare<br />
<strong>the</strong> quality <strong>of</strong> <strong>the</strong> resulting alignments, but <strong>the</strong> few measures we use in Figure 3.6 also favor<br />
<strong>the</strong> new significance strategy.