29.07.2014 Views

On the Analysis of Optical Mapping Data - University of Wisconsin ...

On the Analysis of Optical Mapping Data - University of Wisconsin ...

On the Analysis of Optical Mapping Data - University of Wisconsin ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

66<br />

B). Specifically, Under H i , N i has a negative binomial distribution with mean Λ 0 and size<br />

Λ 0 /κ, denoted N B (Λ 0 , Λ 0 /κ). If <strong>the</strong> true copy number in G i is Λ i /Λ 0 , <strong>the</strong>n N i follows<br />

negative binomial with <strong>the</strong> same size and mean Λ i . The size parameter represents a loss in<br />

sensitivity that is expected given that we are estimating λ 0 (·). With more optical maps from<br />

<strong>the</strong> normal reference genome, we can estimate λ 0 (·) more accurately, in which case κ → 0<br />

and <strong>the</strong> negative binomial distribution converges in distribution to Poisson. It should be<br />

noted that even without this formal argument, a negative binomial model is <strong>of</strong>ten helpful<br />

in accounting for extra-Poisson variation. For instance, this could be due to minor model<br />

misspecification, which would not be unexpected in our application.<br />

Limitations: Testing each H i separately has several drawbacks. The constant <strong>of</strong> proportionality<br />

κ needs to be estimated, yet <strong>the</strong>re are no obvious means to do so. It detects<br />

incidences <strong>of</strong> copy number changes, but provides no estimate <strong>of</strong> <strong>the</strong> associated copy number.<br />

Fur<strong>the</strong>r, <strong>the</strong> aberrations that we hope to detect induce copy number changes that can<br />

be both short (insertions, misassemblies) and long (deletions, legitimate copy number alterations).<br />

Everything else remaining <strong>the</strong> same, short aberrations can only be detected by short<br />

intervals, i.e., small Λ 0 . Ideally, <strong>the</strong>y should detect long aberrations as well. However, we<br />

have a better chance <strong>of</strong> detecting long aberrations with longer intervals, because <strong>the</strong> power<br />

to detect a given copy number change is higher with larger Λ 0 . Put differently, multiple intervals<br />

with weak but consistent evidence against <strong>the</strong> null makes a stronger statement when<br />

<strong>the</strong>y are contiguous than <strong>the</strong>y would individually. The multiple testing scheme described<br />

above does not take this into account.<br />

Hidden Markov Model: To incorporate spatial dependence, we model {Λ i } itself as a<br />

stationary time-homogeneous Markov process with a finite state space {Π 1 , . . .,Π K }. In<br />

view <strong>of</strong> Lemma 1, this is equivalent to modeling <strong>the</strong> sequence <strong>of</strong> observed hit counts {N i } L i=1<br />

by a Hidden Markov Model (HMM), with emission probabilities given by<br />

N i |Π i = k ∼ N B (µ k , σ) (4.1)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!