29.07.2014 Views

On the Analysis of Optical Mapping Data - University of Wisconsin ...

On the Analysis of Optical Mapping Data - University of Wisconsin ...

On the Analysis of Optical Mapping Data - University of Wisconsin ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

63<br />

1988). However, due to sampling errors and alignment efficiencies, <strong>the</strong> observed locations are<br />

better modeled as a non-homogeneous Poisson process (NHPP). Not all maps are successfully<br />

aligned, and we observe locations, up to errors in <strong>the</strong> alignment, only for those that do align.<br />

Alignment algorithms are not uniformly sensitive, and <strong>the</strong> chance that <strong>the</strong> origin <strong>of</strong> a map<br />

is correctly identified depends on <strong>the</strong> origin. For example, maps from regions where markers<br />

are sparse are less likely to align, while maps with rare patterns score higher and are more<br />

likely to align (see Figure 3.10). Copy number alteration in a region <strong>of</strong> <strong>the</strong> sampled genome is<br />

manifested as fur<strong>the</strong>r change in <strong>the</strong> non-homogeneous alignment rate. To detect fluctuations<br />

that are due to copy number changes, we must first adjust for <strong>the</strong> normal fluctuations due<br />

to <strong>the</strong> inherent but unknown non-homogeneity.<br />

Notation: The availability <strong>of</strong> optical map data from normal cells can be used to establish<br />

normal variation in <strong>the</strong> absence <strong>of</strong> CNP, but this requires some preliminary work. Formally,<br />

let {Y 1 < · · · < Y n } be aligned locations <strong>of</strong> <strong>the</strong> optical maps <strong>of</strong> interest, and {X 1 < · · · < X m }<br />

be similar aligned locations for ‘normal’ optical maps. We model {X k } m 1<br />

as realizations <strong>of</strong><br />

a non-homogeneous Poisson process with rate λ 0 (·) and {Y k } n 1<br />

as realizations <strong>of</strong> a nonhomogeneous<br />

Poisson process with rate λ(·). If <strong>the</strong> two genomes are identical, we have<br />

λ(c) = κλ 0 (c), where c denotes position along <strong>the</strong> reference genome, and κ represents a<br />

constant <strong>of</strong> proportionality reflecting <strong>the</strong> possibility <strong>of</strong> different overall coverage in <strong>the</strong> two<br />

genomes. κ can be estimated in this case by <strong>the</strong> ratio n/m <strong>of</strong> <strong>the</strong> number <strong>of</strong> aligned maps. If<br />

copy number alterations exist in <strong>the</strong> genome, κ is defined to be <strong>the</strong> constant <strong>of</strong> proportionality<br />

in regions <strong>of</strong> normal copy number.<br />

Interval counts: It is natural to work with interval counts when dealing with Poisson<br />

process data, so we discretize <strong>the</strong> sample space into intervals G i = (a i , b i ], i = 1, . . .,L tiling<br />

<strong>the</strong> reference genome. The choice <strong>of</strong> G i ’s is important and is discussed below. Assume for<br />

<strong>the</strong> moment that λ 0 and κ are known. For <strong>the</strong> i th interval, define <strong>the</strong> counts<br />

N i = ∑ k<br />

I {ai

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!