29.07.2014 Views

On the Analysis of Optical Mapping Data - University of Wisconsin ...

On the Analysis of Optical Mapping Data - University of Wisconsin ...

On the Analysis of Optical Mapping Data - University of Wisconsin ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

10<br />

1.3.2 <strong>Optical</strong> map data<br />

Representation: An optical map identified by image processing is essentially an ordered<br />

sequence <strong>of</strong> fragment lengths. Thus, an optical map with n fragments may be denoted as<br />

x = (x 1 , . . .,x n )<br />

where x i is <strong>the</strong> measured length <strong>of</strong> <strong>the</strong> i th fragment. Ano<strong>the</strong>r natural representation <strong>of</strong> an<br />

optical map is as a sequence <strong>of</strong> recognition sites. An optical map x is easily converted into a<br />

sequence <strong>of</strong> cut sites by accumulating <strong>the</strong> lengths, noting that <strong>the</strong> cut sites are only defined<br />

up to location. Denoting <strong>the</strong> conversion from fragment lengths to cut site locations by S,<br />

we may write<br />

S(x) =<br />

{0 = s 0 < s 1 < · · · < s n = ∑ }<br />

x i<br />

where x i = s i −s i−1 for i = 1, . . ., n are fragment lengths and s i = ∑ i<br />

j=0 x j are locations <strong>of</strong> cut<br />

sites. The endpoints s 0 and s n are not treated as cut site locations since <strong>the</strong>y represent breaks<br />

that define <strong>the</strong> original molecule as a segment <strong>of</strong> <strong>the</strong> whole genome (from shearing) ra<strong>the</strong>r<br />

than breaks created by <strong>the</strong> restriction enzyme. The first representation, being invariant to<br />

origin, has <strong>the</strong> advantage <strong>of</strong> being unambiguous, but <strong>the</strong> second is <strong>of</strong>ten more useful, e.g.<br />

for defining alignments between two or more optical maps. Of course, both representations<br />

apply to any physical map. <strong>Optical</strong> maps may have additional meta-data associated with<br />

<strong>the</strong>m (e.g. confidence scores from image processing), but most existing algorithms ignore<br />

such attributes.<br />

Characteristics: <strong>Optical</strong> map molecules are generally regarded as random snapshots obtained<br />

from <strong>the</strong> underlying genome, i.e., <strong>the</strong>ir locations are assumed to be uniformly distributed<br />

within <strong>the</strong> genome. Their orientation is not known a priori. The lengths <strong>of</strong> <strong>the</strong><br />

molecules vary; a typical molecule may be around 500 Kb long, and 1000 Kb molecules are<br />

not uncommon. Unlike sequence reads that are obtained as averages over many copies <strong>of</strong> a<br />

clone, optical maps represent single molecules derived from genomic DNA, providing a more

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!