29.07.2014 Views

On the Analysis of Optical Mapping Data - University of Wisconsin ...

On the Analysis of Optical Mapping Data - University of Wisconsin ...

On the Analysis of Optical Mapping Data - University of Wisconsin ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

49<br />

3.4 Discussion<br />

3.4.1 Uses<br />

Alignment is a fundamental problem in optical mapping. From a statistical perspective,<br />

optical map alignment is more challenging than DNA sequence alignment because optical<br />

map data are more noisy. Prior work in this area has mostly focused on developing score<br />

functions that can be used in DP algorithms. Here we have proposed a general framework to<br />

study <strong>the</strong> null distributions <strong>of</strong> optimal scores for an arbitrary score function. Its most obvious<br />

use is to derive significance tests with direct control over error rates. We have demonstrated<br />

<strong>the</strong> usefulness <strong>of</strong> such tests in improving assembly <strong>of</strong> large genomes.<br />

Evaluating score functions: The methods described above are applicable to any score<br />

function and provide a natural mechanism to evaluate <strong>the</strong>m. We consider here <strong>the</strong> modelbased<br />

likelihood ratio (LR) score proposed by Valouev et al. (2006) for aligning optical maps<br />

to an in silico reference. Figure 3.7 plots <strong>the</strong> best spurious ungapped global alignment score<br />

against two replications from P G using this score. The correlation is weaker, but a map<br />

specific cut<strong>of</strong>f is still more appropriate than a constant cut<strong>of</strong>f. We apply <strong>the</strong> direct approach<br />

as before with n = 4 replications to estimate µ(M). The results, shown in table 3.2, indicate<br />

that at least for <strong>the</strong> particular sets <strong>of</strong> parameters used, <strong>the</strong> SOMA score is more sensitive at a<br />

comparable specificity. This is somewhat surprising, since <strong>the</strong> LR score is based on a formal<br />

likelihood ratio test whereas <strong>the</strong> SOMA score is largely heuristic. Informal experiments<br />

suggest that this is at least in part due to <strong>the</strong> sizing model used by Valouev et al. (2006),<br />

which does not consider scaling errors and consequently underestimates <strong>the</strong> marginal sizing<br />

variance for large fragments.<br />

More generally, this framework can be used for exploratory purposes, e.g. to compare <strong>the</strong><br />

performance <strong>of</strong> different scores, or to guide <strong>the</strong> choice <strong>of</strong> parameters for a given score. It is<br />

helpful, particularly for overlap alignments (required in iterative assembly to extend flanks<br />

<strong>of</strong> a contig), if <strong>the</strong> distribution <strong>of</strong> <strong>the</strong> optimal score under <strong>the</strong> null does not depend strongly<br />

on <strong>the</strong> map, since o<strong>the</strong>rwise significant alignments can be masked by spurious alignments.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!