29.07.2014 Views

On the Analysis of Optical Mapping Data - University of Wisconsin ...

On the Analysis of Optical Mapping Data - University of Wisconsin ...

On the Analysis of Optical Mapping Data - University of Wisconsin ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

62<br />

4.2 Methods<br />

Motivation: We start with optical maps obtained from a sampled genome and a reference<br />

map representing a ‘normal’ genome, usually derived in silico from a reference sequence. If<br />

a region within <strong>the</strong> reference has increased (resp. decreased) copy number in <strong>the</strong> sample,<br />

more (fewer) maps will originate from it on average compared to if it had normal copy<br />

number. Our goal is to detect such regions. At any locus along <strong>the</strong> genome, <strong>the</strong> number <strong>of</strong><br />

optical maps overlapping it is a local measure <strong>of</strong> coverage depth. Intuitively, aberrant copy<br />

number should be reflected in a systematic change in this coverage depth. However, using<br />

this measure directly is problematic due to spatial dependences. Instead, we summarize <strong>the</strong><br />

location <strong>of</strong> each map by its midpoint. These locations can <strong>the</strong>n be viewed as independent<br />

random variables.<br />

Alignment: A fundamental prerequisite in our approach is to identify where in <strong>the</strong> in silico<br />

map, if at all, an optical map originated. This is an instance <strong>of</strong> <strong>the</strong> general alignment problem,<br />

which is usually approached by defining a score function that assigns a numeric score<br />

to each potential alignment and <strong>the</strong>n searching for <strong>the</strong> alignment that maximizes this score.<br />

Dynamic Programming (DP) algorithms based on additive score functions have been used<br />

extensively in DNA and protein sequence alignment (Durbin et al., 1998), and with suitable<br />

modifications, <strong>the</strong>y can be used to align restriction maps as well (Huang and Waterman,<br />

1992). Chapters 1 and 3 discuss optical map alignment and related issues in some detail.<br />

For <strong>the</strong> purposes <strong>of</strong> this chapter, <strong>the</strong> goal <strong>of</strong> <strong>the</strong> alignment step is simply to infer <strong>the</strong> location<br />

<strong>of</strong> a given map. We assume that a reasonably sensitive alignment scheme with low false<br />

positive rate is available.<br />

Thinning: Consider <strong>the</strong> location (midpoint) <strong>of</strong> a randomly chosen optical map with respect<br />

to <strong>the</strong> reference genome. For shotgun optical mapping, it is natural to model this location as<br />

uniformly distributed over <strong>the</strong> underlying genome. Ignoring edge effects, it is equivalent to<br />

view map locations as realizations <strong>of</strong> a homogeneous Poisson process (Lander and Waterman,

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!