29.07.2014 Views

On the Analysis of Optical Mapping Data - University of Wisconsin ...

On the Analysis of Optical Mapping Data - University of Wisconsin ...

On the Analysis of Optical Mapping Data - University of Wisconsin ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

47<br />

3.3.3 Simulation<br />

Given a generative model, such as <strong>the</strong> one described in Chapter 2, we can simulate optical<br />

maps and align <strong>the</strong>m using <strong>the</strong> test developed above. Since we know <strong>the</strong> origin <strong>of</strong> <strong>the</strong>se maps,<br />

we can estimate <strong>the</strong> true sensitivity and specificity. Such models are <strong>of</strong> limited use since <strong>the</strong>y<br />

do not capture all <strong>the</strong> sources <strong>of</strong> noise in real data, nor do <strong>the</strong>y reflect <strong>the</strong> fluctuations in<br />

parameter values that occur in <strong>the</strong> course <strong>of</strong> a real experiment. None<strong>the</strong>less, simulation is<br />

a valuable diagnostic tool. We applied <strong>the</strong> regression test derived above with a nominal<br />

specificity <strong>of</strong> 99.9% to 50,000 maps simulated from <strong>the</strong> human reference. 73.42% <strong>of</strong> correct<br />

alignments were declared as significant, 0.39% had at least one spurious alignment declared<br />

to be significant in addition to <strong>the</strong> correct one and 0.27% had only spurious significant<br />

alignments. It should be noted that some incorrect alignments are likely to be legitimate<br />

due to repeat structures in <strong>the</strong> genome.<br />

3.3.4 Improving assembly<br />

Alignments <strong>of</strong> optical maps to a reference is an important component <strong>of</strong> <strong>the</strong> iterative assembly<br />

scheme described in Section 1.3.5. Previously, a constant cut<strong>of</strong>f (determined heuristically)<br />

has been used to assess significance with <strong>the</strong> SOMA score. Figure 3.2 clearly suggests<br />

that this can be improved. By replacing <strong>the</strong> constant cut<strong>of</strong>f with <strong>the</strong> rules derived above and<br />

comparing <strong>the</strong> results, we can get quantitative validation <strong>of</strong> our methods indirectly through<br />

assembly. Figure 3.6 compares <strong>the</strong> results <strong>of</strong> using various significance strategies, including<br />

<strong>the</strong> previously used cut<strong>of</strong>f and two regression cut<strong>of</strong>fs with different nominal specificities.<br />

Chromosome 2 was assembled using <strong>the</strong> CHM data set (Section 1.2), which was not used in<br />

determining <strong>the</strong> significance rules. A simple measure <strong>of</strong> success is <strong>the</strong> contig rate, i.e., <strong>the</strong><br />

proportion <strong>of</strong> maps passing <strong>the</strong> alignment step that are eventually included in <strong>the</strong> assembly.<br />

By this measure, <strong>the</strong> new strategy clearly performs better. It is less clear how to compare<br />

<strong>the</strong> quality <strong>of</strong> <strong>the</strong> resulting alignments, but <strong>the</strong> few measures we use in Figure 3.6 also favor<br />

<strong>the</strong> new significance strategy.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!