29.07.2014 Views

On the Analysis of Optical Mapping Data - University of Wisconsin ...

On the Analysis of Optical Mapping Data - University of Wisconsin ...

On the Analysis of Optical Mapping Data - University of Wisconsin ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

31<br />

as a function <strong>of</strong> <strong>the</strong> parameters. An analytical solution exists for some simple families,<br />

including one that is commonly used, given by π α (t) = 1 − e −αt , α > 0.<br />

Caveats: Although <strong>the</strong> analysis described above is appealing, it depends critically on <strong>the</strong><br />

assumption that Y is exponentially distributed, which in turn depends on <strong>the</strong> sizing error<br />

model used. Unfortunately, sizing error is less stable for small fragments, which is precisely<br />

<strong>the</strong> region <strong>of</strong> interest. It should also be noted that even if <strong>the</strong> exponential distribution<br />

holds marginally, fragment lengths can be considered independent only conditional on <strong>the</strong><br />

underlying restriction map. The marginal dependence is weak for large genomes, but cannot<br />

be ignored for smaller ones.<br />

2.2.2 Length errors<br />

Parameters: Recall that <strong>the</strong> marginal distribution <strong>of</strong> <strong>the</strong> measured length <strong>of</strong> an optical<br />

map fragment <strong>of</strong> true length µ is characterized by mean µ and variance σ 2 (τ 2 + 1)µ+τ 2 µ 2 . σ<br />

and τ can be estimated indirectly given a reasonable alignment scheme, provided we assume<br />

highly significant alignments to be true. The values <strong>of</strong> σ and τ used in Figure 2.4 were<br />

derived using an informal method <strong>of</strong> moments estimator, with <strong>the</strong> observed variances for<br />

subsets <strong>of</strong> <strong>the</strong> data defined by various ranges <strong>of</strong> µ used as <strong>the</strong> response in a linear regression<br />

with terms µ and µ 2 . It should be noted that <strong>the</strong>se estimates are based on maps with<br />

significant alignments and are thus likely to be biased to some extent.<br />

Normality: Figure 2.4 suggests that <strong>the</strong> distribution <strong>of</strong> <strong>the</strong> standardized lengths have<br />

heavier tails than normal. This is clearer in <strong>the</strong> normal Q-Q plot in Figure 2.5. Empirically,<br />

a t distribution appears to be a much better fit, which is not surprising since <strong>the</strong> calculation<br />

<strong>of</strong> X i involves an estimated scale (see Section 2.1.2).<br />

2.2.3 Cut errors<br />

The probability p <strong>of</strong> a true cut being observed and <strong>the</strong> rate ζ <strong>of</strong> spurious cuts can be<br />

similarly estimated using significant alignments. However, <strong>the</strong> amount <strong>of</strong> bias introduced

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!