On the Analysis of Optical Mapping Data - University of Wisconsin ...
On the Analysis of Optical Mapping Data - University of Wisconsin ...
On the Analysis of Optical Mapping Data - University of Wisconsin ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
63<br />
1988). However, due to sampling errors and alignment efficiencies, <strong>the</strong> observed locations are<br />
better modeled as a non-homogeneous Poisson process (NHPP). Not all maps are successfully<br />
aligned, and we observe locations, up to errors in <strong>the</strong> alignment, only for those that do align.<br />
Alignment algorithms are not uniformly sensitive, and <strong>the</strong> chance that <strong>the</strong> origin <strong>of</strong> a map<br />
is correctly identified depends on <strong>the</strong> origin. For example, maps from regions where markers<br />
are sparse are less likely to align, while maps with rare patterns score higher and are more<br />
likely to align (see Figure 3.10). Copy number alteration in a region <strong>of</strong> <strong>the</strong> sampled genome is<br />
manifested as fur<strong>the</strong>r change in <strong>the</strong> non-homogeneous alignment rate. To detect fluctuations<br />
that are due to copy number changes, we must first adjust for <strong>the</strong> normal fluctuations due<br />
to <strong>the</strong> inherent but unknown non-homogeneity.<br />
Notation: The availability <strong>of</strong> optical map data from normal cells can be used to establish<br />
normal variation in <strong>the</strong> absence <strong>of</strong> CNP, but this requires some preliminary work. Formally,<br />
let {Y 1 < · · · < Y n } be aligned locations <strong>of</strong> <strong>the</strong> optical maps <strong>of</strong> interest, and {X 1 < · · · < X m }<br />
be similar aligned locations for ‘normal’ optical maps. We model {X k } m 1<br />
as realizations <strong>of</strong><br />
a non-homogeneous Poisson process with rate λ 0 (·) and {Y k } n 1<br />
as realizations <strong>of</strong> a nonhomogeneous<br />
Poisson process with rate λ(·). If <strong>the</strong> two genomes are identical, we have<br />
λ(c) = κλ 0 (c), where c denotes position along <strong>the</strong> reference genome, and κ represents a<br />
constant <strong>of</strong> proportionality reflecting <strong>the</strong> possibility <strong>of</strong> different overall coverage in <strong>the</strong> two<br />
genomes. κ can be estimated in this case by <strong>the</strong> ratio n/m <strong>of</strong> <strong>the</strong> number <strong>of</strong> aligned maps. If<br />
copy number alterations exist in <strong>the</strong> genome, κ is defined to be <strong>the</strong> constant <strong>of</strong> proportionality<br />
in regions <strong>of</strong> normal copy number.<br />
Interval counts: It is natural to work with interval counts when dealing with Poisson<br />
process data, so we discretize <strong>the</strong> sample space into intervals G i = (a i , b i ], i = 1, . . .,L tiling<br />
<strong>the</strong> reference genome. The choice <strong>of</strong> G i ’s is important and is discussed below. Assume for<br />
<strong>the</strong> moment that λ 0 and κ are known. For <strong>the</strong> i th interval, define <strong>the</strong> counts<br />
N i = ∑ k<br />
I {ai