29.07.2014 Views

On the Analysis of Optical Mapping Data - University of Wisconsin ...

On the Analysis of Optical Mapping Data - University of Wisconsin ...

On the Analysis of Optical Mapping Data - University of Wisconsin ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

12<br />

restriction map can be derived in silico by identifying <strong>the</strong> enzyme recognition pattern in<br />

<strong>the</strong> reference sequence, and <strong>the</strong> primary goal <strong>of</strong> optical mapping is to determine how <strong>the</strong><br />

genome under study differs from <strong>the</strong> reference copy in terms <strong>of</strong> <strong>the</strong>ir respective restriction<br />

maps. Such differences can be due to errors in <strong>the</strong> sequence, especially in <strong>the</strong> early stages<br />

<strong>of</strong> sequencing, but more importantly, <strong>the</strong>y can reflect real biological variation. In ei<strong>the</strong>r<br />

case, <strong>the</strong>se broad goals are <strong>of</strong>ten tackled by breaking <strong>the</strong>m down into smaller, more tractable<br />

problems.<br />

Algorithmic challenges: <strong>Optical</strong> mapping has been very successful in obtaining restriction<br />

maps <strong>of</strong> relatively small genomes (e.g. microbes). A critical component <strong>of</strong> this success<br />

has been algorithmic research in <strong>the</strong> 1990’s specifically aimed at optical mapping data, notably<br />

<strong>the</strong> work <strong>of</strong> Anantharaman et al. (1999) leading to <strong>the</strong> Gentig assembly s<strong>of</strong>tware. With<br />

recent technological advances, <strong>the</strong> focus has shifted to larger genomes. The primary challenge<br />

introduced by this shift is scalability. Computational methods that work well for microbial<br />

genomes may fail for large genomes due to memory and speed limits <strong>of</strong> existing computational<br />

systems. Since mammalian genomes differ in size from microbial genomes by several orders<br />

<strong>of</strong> magnitude, <strong>the</strong> relative coverage may be far less. Careful statistical analysis is thus critical<br />

in making full use <strong>of</strong> <strong>the</strong> available data. New methods are also required to take advantage <strong>of</strong><br />

in silico maps when <strong>the</strong>y are available. It should be noted that restriction maps have many<br />

fundamental similarities with sequence data, and algorithms developed for sequence analysis<br />

can <strong>of</strong>ten be adapted to work with optical maps (e.g. Huang and Waterman, 1992).<br />

Validation: Due to <strong>the</strong> nature <strong>of</strong> optical mapping data, it is rarely possible to know <strong>the</strong><br />

true answer except in very special circumstances. It is <strong>the</strong>refore natural to use simulation to<br />

validate algorithmic techniques. While this has been implicitly acknowledged in much <strong>of</strong> <strong>the</strong><br />

algorithmic work on optical mapping, we think that <strong>the</strong> stochastic model used in simulation<br />

itself deserves closer attention. With <strong>the</strong> large data sets that are now available, we can also<br />

hope to use <strong>the</strong> data to validate models, at least in some limited ways. In particular, we<br />

have found graphical diagnostics to be particularly useful in model checking (see Section 2.3),

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!