29.07.2014 Views

On the Analysis of Optical Mapping Data - University of Wisconsin ...

On the Analysis of Optical Mapping Data - University of Wisconsin ...

On the Analysis of Optical Mapping Data - University of Wisconsin ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

24<br />

unknown probability p, say, <strong>of</strong> success. It is possible to argue that <strong>the</strong> probability should<br />

depend on proximity to o<strong>the</strong>r cuts, but <strong>the</strong> idea is difficult to formalize. Instead, <strong>the</strong> issue<br />

is dealt with using a desorption model for small fragments (see below). An optical map can<br />

also contain false cuts, i.e. apparent restriction sites that correspond to no restriction site<br />

in <strong>the</strong> true map, perhaps due to random breakage <strong>of</strong> DNA or image errors. The locations<br />

<strong>of</strong> such spurious cut sites in optical maps may be modeled as <strong>the</strong> realizations <strong>of</strong> a homogeneous<br />

Poisson process, with rate ζ, say, per Kb <strong>of</strong> DNA. These models have been used by<br />

Anantharaman et al. (1999) and Valouev et al. (2006).<br />

Length measurement errors: Consider an optical map with n fragments <strong>of</strong> measured<br />

lengths X 1 , . . .,X n . Assuming no cut errors, each fragment has a corresponding true but<br />

unobserved length, which we denote by µ i , i = 1, . . .,n. Recall that each X i is calculated as<br />

<strong>the</strong> product Y i R i , where Y i is <strong>the</strong> total fluorescent intensity <strong>of</strong> <strong>the</strong> pixels that constitute <strong>the</strong><br />

fragment, and R i is a scale factor to convert fluorescent intensities to base pairs, estimated<br />

using standards. Restricting our attention to <strong>the</strong> marginal distribution <strong>of</strong> X i , we may treat<br />

Y i and R i as independent latent variables within a given image. We can assume without loss<br />

<strong>of</strong> generality that <strong>the</strong> true scale factor is 1. It is natural to assume that <strong>the</strong> distribution <strong>of</strong><br />

Y i depends only on µ i . Valouev et al. (2006) note that Y i is <strong>the</strong> sum <strong>of</strong> intensities <strong>of</strong> several<br />

pixels. Assuming <strong>the</strong>se terms to be i.i.d., <strong>the</strong> expected number <strong>of</strong> terms is proportional to<br />

µ i . Invoking <strong>the</strong> Central Limit Theorem, <strong>the</strong>y postulate that for some σ,<br />

Y i ∼ N ( )<br />

µ i , σ 2 µ i<br />

This model additionally has <strong>the</strong> following desirable property: denoting <strong>the</strong> N (µ, σ 2 µ) density<br />

by f µ , if Y i ∼ f µi , Y j ∼ f µj and Y i and Y j are independent, <strong>the</strong>n Y i + Y j ∼ f µi +µ j<br />

. This<br />

is relevant when adjacent fragments are reported as one due to a missing cut. Valouev et al.<br />

(2006) ignore scaling and postulate that <strong>the</strong> observed lengths X i = Y i . If we instead assume<br />

that <strong>the</strong> mean E(R i ) = 1 and variance V (R i ) = τ 2 > 0 without making any fur<strong>the</strong>r<br />

assumptions about <strong>the</strong> distribution <strong>of</strong> R i , we have<br />

E(X i ) = E(E(Y i R i |R i )) = µ i

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!