On the Analysis of Optical Mapping Data - University of Wisconsin ...
On the Analysis of Optical Mapping Data - University of Wisconsin ...
On the Analysis of Optical Mapping Data - University of Wisconsin ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
31<br />
as a function <strong>of</strong> <strong>the</strong> parameters. An analytical solution exists for some simple families,<br />
including one that is commonly used, given by π α (t) = 1 − e −αt , α > 0.<br />
Caveats: Although <strong>the</strong> analysis described above is appealing, it depends critically on <strong>the</strong><br />
assumption that Y is exponentially distributed, which in turn depends on <strong>the</strong> sizing error<br />
model used. Unfortunately, sizing error is less stable for small fragments, which is precisely<br />
<strong>the</strong> region <strong>of</strong> interest. It should also be noted that even if <strong>the</strong> exponential distribution<br />
holds marginally, fragment lengths can be considered independent only conditional on <strong>the</strong><br />
underlying restriction map. The marginal dependence is weak for large genomes, but cannot<br />
be ignored for smaller ones.<br />
2.2.2 Length errors<br />
Parameters: Recall that <strong>the</strong> marginal distribution <strong>of</strong> <strong>the</strong> measured length <strong>of</strong> an optical<br />
map fragment <strong>of</strong> true length µ is characterized by mean µ and variance σ 2 (τ 2 + 1)µ+τ 2 µ 2 . σ<br />
and τ can be estimated indirectly given a reasonable alignment scheme, provided we assume<br />
highly significant alignments to be true. The values <strong>of</strong> σ and τ used in Figure 2.4 were<br />
derived using an informal method <strong>of</strong> moments estimator, with <strong>the</strong> observed variances for<br />
subsets <strong>of</strong> <strong>the</strong> data defined by various ranges <strong>of</strong> µ used as <strong>the</strong> response in a linear regression<br />
with terms µ and µ 2 . It should be noted that <strong>the</strong>se estimates are based on maps with<br />
significant alignments and are thus likely to be biased to some extent.<br />
Normality: Figure 2.4 suggests that <strong>the</strong> distribution <strong>of</strong> <strong>the</strong> standardized lengths have<br />
heavier tails than normal. This is clearer in <strong>the</strong> normal Q-Q plot in Figure 2.5. Empirically,<br />
a t distribution appears to be a much better fit, which is not surprising since <strong>the</strong> calculation<br />
<strong>of</strong> X i involves an estimated scale (see Section 2.1.2).<br />
2.2.3 Cut errors<br />
The probability p <strong>of</strong> a true cut being observed and <strong>the</strong> rate ζ <strong>of</strong> spurious cuts can be<br />
similarly estimated using significant alignments. However, <strong>the</strong> amount <strong>of</strong> bias introduced