29.07.2014 Views

On the Analysis of Optical Mapping Data - University of Wisconsin ...

On the Analysis of Optical Mapping Data - University of Wisconsin ...

On the Analysis of Optical Mapping Data - University of Wisconsin ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

87<br />

Appendix B: Hidden Markov Model calculations<br />

Negative binomial emissions: Since our analysis is based on interval counts <strong>of</strong> events<br />

modeled by a Poisson point process, it is natural to model <strong>the</strong> counts by a Poisson distribution.<br />

Due to inhomogeneity <strong>of</strong> <strong>the</strong> process, <strong>the</strong> distribution <strong>of</strong> <strong>the</strong> counts are not determined<br />

solely by <strong>the</strong> interval lengths. To normalize <strong>the</strong> counts, we choose data dependent intervals,<br />

thus altering <strong>the</strong> count distribution. With this modification, <strong>the</strong> counts are no longer<br />

Poisson, but negative binomial, a fact which follows from <strong>the</strong> following<br />

Lemma 1. Let X 1 < · · · < X m be observed event times from a (possibly non-homogeneous)<br />

Poisson process with rate λ(·) and Y 1 < · · · < Y n be observed event times from a Poisson<br />

process with rate αλ(·). Let G = (X l , X l+p+1 ] for some l ∈ {1, . . ., m − p − 1} and N =<br />

∑<br />

k I {Y k ∈G}. Then, N has a negative binomial distribution with mean αp and size p.<br />

Pro<strong>of</strong>. Without loss <strong>of</strong> generality, assume that both processes are homogeneous; if not, <strong>the</strong>y<br />

can be homogenized by a transformation <strong>of</strong> <strong>the</strong> time axis determined by λ(·). Also assume<br />

w.l.g. that λ(t) = 1 ∀t, in which case <strong>the</strong> (Y k ) n 1<br />

process has constant intensity α. Then,<br />

<strong>the</strong> length <strong>of</strong> G, denoted |G|, is <strong>the</strong> waiting time till <strong>the</strong> p th event <strong>of</strong> a homogeneous Poisson<br />

process with unit intensity, or equivalently, <strong>the</strong> sum <strong>of</strong> p independent standard exponentials.<br />

Thus, |G| has <strong>the</strong> Gamma distribution with shape parameter p, and N|G ∼ Poisson (α|G|)<br />

by definition. The pro<strong>of</strong> is completed by noting that <strong>the</strong> negative binomial distribution can<br />

be expressed as a Gamma mixture <strong>of</strong> Poissons, with parameters as specified in <strong>the</strong> lemma.<br />

We use a somewhat non-standard parameterization <strong>of</strong> <strong>the</strong> negative binomial distribution,<br />

where <strong>the</strong> mass function <strong>of</strong> a random variable X with mean µ and size σ, denoted X ∼<br />

N B (µ, σ), is given by<br />

p(X = x) =<br />

Γ(x + σ)<br />

Γ(σ) x!<br />

( σ<br />

) σ ( µ<br />

) x<br />

σ + µ σ + µ<br />

This has <strong>the</strong> nice property that E(X) = µ and V (X) = µ + µ2<br />

. The distribution <strong>of</strong> X<br />

σ<br />

converges in distribution to Poisson (µ) as σ → ∞. In practice, <strong>the</strong> size parameter may<br />

account for lack <strong>of</strong> fit in <strong>the</strong> model as well.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!