29.07.2014 Views

On the Analysis of Optical Mapping Data - University of Wisconsin ...

On the Analysis of Optical Mapping Data - University of Wisconsin ...

On the Analysis of Optical Mapping Data - University of Wisconsin ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

28<br />

are realizations <strong>of</strong> a homogeneous Poisson process with rate ζ. If we assume independence <strong>of</strong><br />

<strong>the</strong>se errors and no error in sizing, <strong>the</strong>n <strong>the</strong> observed cut sites are realizations <strong>of</strong> a homogeneous<br />

Poisson process with rate p θ+ζ. Consequently, <strong>the</strong> fragment sizes Y are exponentially<br />

distributed. The assumption <strong>of</strong> no sizing error is <strong>of</strong> course unrealistic; Valouev et al. (2006)<br />

show that <strong>the</strong> exponential distribution holds approximately even with reasonable sizing error<br />

models.<br />

Exponential rate: The rate <strong>of</strong> <strong>the</strong> relevant exponential distribution depends on unknown<br />

parameters. Fortunately, this rate can be estimated directly from <strong>the</strong> data, thanks to <strong>the</strong><br />

memoryless property <strong>of</strong> <strong>the</strong> exponential distribution, namely that<br />

P(Y > t + s | Y > t) = P(Y > s)<br />

when Y is exponentially distributed, or equivalently, Y |Y > t has <strong>the</strong> same distribution as<br />

Y + t. In o<strong>the</strong>r words, left truncation <strong>of</strong> exponential variates is equivalent to an additive<br />

shift. Since it is known empirically that π(y) = 1 for y > 15 Kb, <strong>the</strong> truncated observations<br />

X|X > 15 has <strong>the</strong> same distribution as Y |Y > 15, i.e., an exponential truncated at 15 Kb.<br />

A robust estimate <strong>of</strong> <strong>the</strong> rate can be obtained from <strong>the</strong> interquartile range <strong>of</strong> <strong>the</strong> truncated<br />

observations. Empirical evidence is provided by a Q-Q plot <strong>of</strong> <strong>the</strong> observed values <strong>of</strong> X in<br />

Figure 2.2.<br />

Non-parametric estimation: A naive non-parametric estimate <strong>of</strong> π is given by<br />

̂π(t) ∝ ĥ(t)<br />

g(t)<br />

where ĥ is <strong>the</strong> estimated density <strong>of</strong> observed fragment lengths X and g is <strong>the</strong> known density <strong>of</strong><br />

Y . X is a positive random variable, so usual kernel density estimates are inappropriate, but<br />

alternatives such as zero-truncated kernel density estimates and log-spline density estimates<br />

exist. More interestingly, <strong>the</strong> non-parametric MLE <strong>of</strong> π can be obtained under <strong>the</strong> additional<br />

assumption that π is increasing. This is reasonable since longer fragments are less likely to<br />

desorb. The MLE follows from <strong>the</strong> existence <strong>of</strong> <strong>the</strong> MLE <strong>of</strong> a monotone density, given by

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!