On the Analysis of Optical Mapping Data - University of Wisconsin ...
On the Analysis of Optical Mapping Data - University of Wisconsin ...
On the Analysis of Optical Mapping Data - University of Wisconsin ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
66<br />
B). Specifically, Under H i , N i has a negative binomial distribution with mean Λ 0 and size<br />
Λ 0 /κ, denoted N B (Λ 0 , Λ 0 /κ). If <strong>the</strong> true copy number in G i is Λ i /Λ 0 , <strong>the</strong>n N i follows<br />
negative binomial with <strong>the</strong> same size and mean Λ i . The size parameter represents a loss in<br />
sensitivity that is expected given that we are estimating λ 0 (·). With more optical maps from<br />
<strong>the</strong> normal reference genome, we can estimate λ 0 (·) more accurately, in which case κ → 0<br />
and <strong>the</strong> negative binomial distribution converges in distribution to Poisson. It should be<br />
noted that even without this formal argument, a negative binomial model is <strong>of</strong>ten helpful<br />
in accounting for extra-Poisson variation. For instance, this could be due to minor model<br />
misspecification, which would not be unexpected in our application.<br />
Limitations: Testing each H i separately has several drawbacks. The constant <strong>of</strong> proportionality<br />
κ needs to be estimated, yet <strong>the</strong>re are no obvious means to do so. It detects<br />
incidences <strong>of</strong> copy number changes, but provides no estimate <strong>of</strong> <strong>the</strong> associated copy number.<br />
Fur<strong>the</strong>r, <strong>the</strong> aberrations that we hope to detect induce copy number changes that can<br />
be both short (insertions, misassemblies) and long (deletions, legitimate copy number alterations).<br />
Everything else remaining <strong>the</strong> same, short aberrations can only be detected by short<br />
intervals, i.e., small Λ 0 . Ideally, <strong>the</strong>y should detect long aberrations as well. However, we<br />
have a better chance <strong>of</strong> detecting long aberrations with longer intervals, because <strong>the</strong> power<br />
to detect a given copy number change is higher with larger Λ 0 . Put differently, multiple intervals<br />
with weak but consistent evidence against <strong>the</strong> null makes a stronger statement when<br />
<strong>the</strong>y are contiguous than <strong>the</strong>y would individually. The multiple testing scheme described<br />
above does not take this into account.<br />
Hidden Markov Model: To incorporate spatial dependence, we model {Λ i } itself as a<br />
stationary time-homogeneous Markov process with a finite state space {Π 1 , . . .,Π K }. In<br />
view <strong>of</strong> Lemma 1, this is equivalent to modeling <strong>the</strong> sequence <strong>of</strong> observed hit counts {N i } L i=1<br />
by a Hidden Markov Model (HMM), with emission probabilities given by<br />
N i |Π i = k ∼ N B (µ k , σ) (4.1)