On the Analysis of Optical Mapping Data - University of Wisconsin ...
On the Analysis of Optical Mapping Data - University of Wisconsin ...
On the Analysis of Optical Mapping Data - University of Wisconsin ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
12<br />
restriction map can be derived in silico by identifying <strong>the</strong> enzyme recognition pattern in<br />
<strong>the</strong> reference sequence, and <strong>the</strong> primary goal <strong>of</strong> optical mapping is to determine how <strong>the</strong><br />
genome under study differs from <strong>the</strong> reference copy in terms <strong>of</strong> <strong>the</strong>ir respective restriction<br />
maps. Such differences can be due to errors in <strong>the</strong> sequence, especially in <strong>the</strong> early stages<br />
<strong>of</strong> sequencing, but more importantly, <strong>the</strong>y can reflect real biological variation. In ei<strong>the</strong>r<br />
case, <strong>the</strong>se broad goals are <strong>of</strong>ten tackled by breaking <strong>the</strong>m down into smaller, more tractable<br />
problems.<br />
Algorithmic challenges: <strong>Optical</strong> mapping has been very successful in obtaining restriction<br />
maps <strong>of</strong> relatively small genomes (e.g. microbes). A critical component <strong>of</strong> this success<br />
has been algorithmic research in <strong>the</strong> 1990’s specifically aimed at optical mapping data, notably<br />
<strong>the</strong> work <strong>of</strong> Anantharaman et al. (1999) leading to <strong>the</strong> Gentig assembly s<strong>of</strong>tware. With<br />
recent technological advances, <strong>the</strong> focus has shifted to larger genomes. The primary challenge<br />
introduced by this shift is scalability. Computational methods that work well for microbial<br />
genomes may fail for large genomes due to memory and speed limits <strong>of</strong> existing computational<br />
systems. Since mammalian genomes differ in size from microbial genomes by several orders<br />
<strong>of</strong> magnitude, <strong>the</strong> relative coverage may be far less. Careful statistical analysis is thus critical<br />
in making full use <strong>of</strong> <strong>the</strong> available data. New methods are also required to take advantage <strong>of</strong><br />
in silico maps when <strong>the</strong>y are available. It should be noted that restriction maps have many<br />
fundamental similarities with sequence data, and algorithms developed for sequence analysis<br />
can <strong>of</strong>ten be adapted to work with optical maps (e.g. Huang and Waterman, 1992).<br />
Validation: Due to <strong>the</strong> nature <strong>of</strong> optical mapping data, it is rarely possible to know <strong>the</strong><br />
true answer except in very special circumstances. It is <strong>the</strong>refore natural to use simulation to<br />
validate algorithmic techniques. While this has been implicitly acknowledged in much <strong>of</strong> <strong>the</strong><br />
algorithmic work on optical mapping, we think that <strong>the</strong> stochastic model used in simulation<br />
itself deserves closer attention. With <strong>the</strong> large data sets that are now available, we can also<br />
hope to use <strong>the</strong> data to validate models, at least in some limited ways. In particular, we<br />
have found graphical diagnostics to be particularly useful in model checking (see Section 2.3),