29.07.2014 Views

On the Analysis of Optical Mapping Data - University of Wisconsin ...

On the Analysis of Optical Mapping Data - University of Wisconsin ...

On the Analysis of Optical Mapping Data - University of Wisconsin ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

27<br />

affect inference. If necessary, <strong>the</strong> degree <strong>of</strong> change can be assessed by dividing <strong>the</strong> data by<br />

time <strong>of</strong> collection and compare estimates across time periods.<br />

To illustrate estimation techniques, we use <strong>the</strong> GM07535 data set and <strong>the</strong> in silico reference<br />

map derived from <strong>the</strong> human genome sequence (Build 35).<br />

2.2.1 Desorption<br />

Random truncation: We begin with <strong>the</strong> estimation <strong>of</strong> desorption rates, since this can<br />

be achieved without alignments, under certain assumptions. For a fragment in <strong>the</strong> true<br />

restriction map spanned by an observed optical map, let Z be a random variable indicating<br />

whe<strong>the</strong>r that particular fragment was observed, and let Y be its measured length had it been<br />

observed. The desorption rate is quantified by <strong>the</strong> probability that a fragment is observed<br />

(not truncated), given by<br />

π(y) = P (Z = 1|Y = y)<br />

Suppose that <strong>the</strong> marginal density <strong>of</strong> <strong>the</strong> unobserved (pre-truncation) random variable Y is<br />

g. Let X represent <strong>the</strong> length <strong>of</strong> an observed (truncated version <strong>of</strong> Y ) fragment, i.e. X = Y<br />

if Z = 1. Then, <strong>the</strong> marginal density <strong>of</strong> X is given by<br />

h(x) = 1 K<br />

π(x) g(x)<br />

where K is <strong>the</strong> normalizing constant<br />

K =<br />

∫ ∞<br />

0<br />

π(t) g(t) dt<br />

As formulated, π(·) only identifiable up to scale since π ′ (y) := α π(y), 0 < α ≤ 1 induces <strong>the</strong><br />

same h from g. Desorption is known to affect small fragments only, so we may additionally<br />

assume that lim y→∞ π(y) = 1, making π(·) identifiable. Empirically, π(y) = 1 for y > 15 Kb.<br />

Length distribution: Let us consider <strong>the</strong> distribution <strong>of</strong> <strong>the</strong> unobserved random variable<br />

Y . Suppose <strong>the</strong> true recognition sites are realizations <strong>of</strong> a homogeneous Poisson process with<br />

rate θ, true recognition sites are observed independently with probability p, and false cuts

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!