On the Analysis of Optical Mapping Data - University of Wisconsin ...
On the Analysis of Optical Mapping Data - University of Wisconsin ...
On the Analysis of Optical Mapping Data - University of Wisconsin ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
87<br />
Appendix B: Hidden Markov Model calculations<br />
Negative binomial emissions: Since our analysis is based on interval counts <strong>of</strong> events<br />
modeled by a Poisson point process, it is natural to model <strong>the</strong> counts by a Poisson distribution.<br />
Due to inhomogeneity <strong>of</strong> <strong>the</strong> process, <strong>the</strong> distribution <strong>of</strong> <strong>the</strong> counts are not determined<br />
solely by <strong>the</strong> interval lengths. To normalize <strong>the</strong> counts, we choose data dependent intervals,<br />
thus altering <strong>the</strong> count distribution. With this modification, <strong>the</strong> counts are no longer<br />
Poisson, but negative binomial, a fact which follows from <strong>the</strong> following<br />
Lemma 1. Let X 1 < · · · < X m be observed event times from a (possibly non-homogeneous)<br />
Poisson process with rate λ(·) and Y 1 < · · · < Y n be observed event times from a Poisson<br />
process with rate αλ(·). Let G = (X l , X l+p+1 ] for some l ∈ {1, . . ., m − p − 1} and N =<br />
∑<br />
k I {Y k ∈G}. Then, N has a negative binomial distribution with mean αp and size p.<br />
Pro<strong>of</strong>. Without loss <strong>of</strong> generality, assume that both processes are homogeneous; if not, <strong>the</strong>y<br />
can be homogenized by a transformation <strong>of</strong> <strong>the</strong> time axis determined by λ(·). Also assume<br />
w.l.g. that λ(t) = 1 ∀t, in which case <strong>the</strong> (Y k ) n 1<br />
process has constant intensity α. Then,<br />
<strong>the</strong> length <strong>of</strong> G, denoted |G|, is <strong>the</strong> waiting time till <strong>the</strong> p th event <strong>of</strong> a homogeneous Poisson<br />
process with unit intensity, or equivalently, <strong>the</strong> sum <strong>of</strong> p independent standard exponentials.<br />
Thus, |G| has <strong>the</strong> Gamma distribution with shape parameter p, and N|G ∼ Poisson (α|G|)<br />
by definition. The pro<strong>of</strong> is completed by noting that <strong>the</strong> negative binomial distribution can<br />
be expressed as a Gamma mixture <strong>of</strong> Poissons, with parameters as specified in <strong>the</strong> lemma.<br />
We use a somewhat non-standard parameterization <strong>of</strong> <strong>the</strong> negative binomial distribution,<br />
where <strong>the</strong> mass function <strong>of</strong> a random variable X with mean µ and size σ, denoted X ∼<br />
N B (µ, σ), is given by<br />
p(X = x) =<br />
Γ(x + σ)<br />
Γ(σ) x!<br />
( σ<br />
) σ ( µ<br />
) x<br />
σ + µ σ + µ<br />
This has <strong>the</strong> nice property that E(X) = µ and V (X) = µ + µ2<br />
. The distribution <strong>of</strong> X<br />
σ<br />
converges in distribution to Poisson (µ) as σ → ∞. In practice, <strong>the</strong> size parameter may<br />
account for lack <strong>of</strong> fit in <strong>the</strong> model as well.