On the Analysis of Optical Mapping Data - University of Wisconsin ...
On the Analysis of Optical Mapping Data - University of Wisconsin ...
On the Analysis of Optical Mapping Data - University of Wisconsin ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
28<br />
are realizations <strong>of</strong> a homogeneous Poisson process with rate ζ. If we assume independence <strong>of</strong><br />
<strong>the</strong>se errors and no error in sizing, <strong>the</strong>n <strong>the</strong> observed cut sites are realizations <strong>of</strong> a homogeneous<br />
Poisson process with rate p θ+ζ. Consequently, <strong>the</strong> fragment sizes Y are exponentially<br />
distributed. The assumption <strong>of</strong> no sizing error is <strong>of</strong> course unrealistic; Valouev et al. (2006)<br />
show that <strong>the</strong> exponential distribution holds approximately even with reasonable sizing error<br />
models.<br />
Exponential rate: The rate <strong>of</strong> <strong>the</strong> relevant exponential distribution depends on unknown<br />
parameters. Fortunately, this rate can be estimated directly from <strong>the</strong> data, thanks to <strong>the</strong><br />
memoryless property <strong>of</strong> <strong>the</strong> exponential distribution, namely that<br />
P(Y > t + s | Y > t) = P(Y > s)<br />
when Y is exponentially distributed, or equivalently, Y |Y > t has <strong>the</strong> same distribution as<br />
Y + t. In o<strong>the</strong>r words, left truncation <strong>of</strong> exponential variates is equivalent to an additive<br />
shift. Since it is known empirically that π(y) = 1 for y > 15 Kb, <strong>the</strong> truncated observations<br />
X|X > 15 has <strong>the</strong> same distribution as Y |Y > 15, i.e., an exponential truncated at 15 Kb.<br />
A robust estimate <strong>of</strong> <strong>the</strong> rate can be obtained from <strong>the</strong> interquartile range <strong>of</strong> <strong>the</strong> truncated<br />
observations. Empirical evidence is provided by a Q-Q plot <strong>of</strong> <strong>the</strong> observed values <strong>of</strong> X in<br />
Figure 2.2.<br />
Non-parametric estimation: A naive non-parametric estimate <strong>of</strong> π is given by<br />
̂π(t) ∝ ĥ(t)<br />
g(t)<br />
where ĥ is <strong>the</strong> estimated density <strong>of</strong> observed fragment lengths X and g is <strong>the</strong> known density <strong>of</strong><br />
Y . X is a positive random variable, so usual kernel density estimates are inappropriate, but<br />
alternatives such as zero-truncated kernel density estimates and log-spline density estimates<br />
exist. More interestingly, <strong>the</strong> non-parametric MLE <strong>of</strong> π can be obtained under <strong>the</strong> additional<br />
assumption that π is increasing. This is reasonable since longer fragments are less likely to<br />
desorb. The MLE follows from <strong>the</strong> existence <strong>of</strong> <strong>the</strong> MLE <strong>of</strong> a monotone density, given by