On the Analysis of Optical Mapping Data - University of Wisconsin ...
On the Analysis of Optical Mapping Data - University of Wisconsin ...
On the Analysis of Optical Mapping Data - University of Wisconsin ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
24<br />
unknown probability p, say, <strong>of</strong> success. It is possible to argue that <strong>the</strong> probability should<br />
depend on proximity to o<strong>the</strong>r cuts, but <strong>the</strong> idea is difficult to formalize. Instead, <strong>the</strong> issue<br />
is dealt with using a desorption model for small fragments (see below). An optical map can<br />
also contain false cuts, i.e. apparent restriction sites that correspond to no restriction site<br />
in <strong>the</strong> true map, perhaps due to random breakage <strong>of</strong> DNA or image errors. The locations<br />
<strong>of</strong> such spurious cut sites in optical maps may be modeled as <strong>the</strong> realizations <strong>of</strong> a homogeneous<br />
Poisson process, with rate ζ, say, per Kb <strong>of</strong> DNA. These models have been used by<br />
Anantharaman et al. (1999) and Valouev et al. (2006).<br />
Length measurement errors: Consider an optical map with n fragments <strong>of</strong> measured<br />
lengths X 1 , . . .,X n . Assuming no cut errors, each fragment has a corresponding true but<br />
unobserved length, which we denote by µ i , i = 1, . . .,n. Recall that each X i is calculated as<br />
<strong>the</strong> product Y i R i , where Y i is <strong>the</strong> total fluorescent intensity <strong>of</strong> <strong>the</strong> pixels that constitute <strong>the</strong><br />
fragment, and R i is a scale factor to convert fluorescent intensities to base pairs, estimated<br />
using standards. Restricting our attention to <strong>the</strong> marginal distribution <strong>of</strong> X i , we may treat<br />
Y i and R i as independent latent variables within a given image. We can assume without loss<br />
<strong>of</strong> generality that <strong>the</strong> true scale factor is 1. It is natural to assume that <strong>the</strong> distribution <strong>of</strong><br />
Y i depends only on µ i . Valouev et al. (2006) note that Y i is <strong>the</strong> sum <strong>of</strong> intensities <strong>of</strong> several<br />
pixels. Assuming <strong>the</strong>se terms to be i.i.d., <strong>the</strong> expected number <strong>of</strong> terms is proportional to<br />
µ i . Invoking <strong>the</strong> Central Limit Theorem, <strong>the</strong>y postulate that for some σ,<br />
Y i ∼ N ( )<br />
µ i , σ 2 µ i<br />
This model additionally has <strong>the</strong> following desirable property: denoting <strong>the</strong> N (µ, σ 2 µ) density<br />
by f µ , if Y i ∼ f µi , Y j ∼ f µj and Y i and Y j are independent, <strong>the</strong>n Y i + Y j ∼ f µi +µ j<br />
. This<br />
is relevant when adjacent fragments are reported as one due to a missing cut. Valouev et al.<br />
(2006) ignore scaling and postulate that <strong>the</strong> observed lengths X i = Y i . If we instead assume<br />
that <strong>the</strong> mean E(R i ) = 1 and variance V (R i ) = τ 2 > 0 without making any fur<strong>the</strong>r<br />
assumptions about <strong>the</strong> distribution <strong>of</strong> R i , we have<br />
E(X i ) = E(E(Y i R i |R i )) = µ i