On the Analysis of Optical Mapping Data - University of Wisconsin ...
On the Analysis of Optical Mapping Data - University of Wisconsin ...
On the Analysis of Optical Mapping Data - University of Wisconsin ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
80<br />
5.2 O<strong>the</strong>r topics<br />
Preliminary filtering: The optical mapping pipeline processes massive amounts <strong>of</strong> data.<br />
This process is helped if maps that are unlikely to contribute to <strong>the</strong> results are filtered<br />
out beforehand. It is currently standard practice to leave out maps that do not exceed a<br />
minimum length threshold, usually 300 Kb. However, length <strong>of</strong> an optical map does not<br />
by itself determine <strong>the</strong> usefulness <strong>of</strong> a map. As suggested in Section 3.4.2, one alternative<br />
is to use a map-specific information score, derived from alignment score functions, for this<br />
purpose. A similar but more difficult line <strong>of</strong> research would be to develop a map-specific<br />
confidence score as part <strong>of</strong> <strong>the</strong> image processing step, to reflect <strong>the</strong> likely quality <strong>of</strong> <strong>the</strong> map.<br />
Iterative assembly: The iterative assembly scheme described by Reslewic et al. has been<br />
successful in assembling large genomes by taking advantage <strong>of</strong> an initial reference map. Being<br />
a first attempt, <strong>the</strong>re is naturally scope for improvement. Although <strong>the</strong> results have been<br />
validated to some extent (e.g. using PCR), <strong>the</strong> use <strong>of</strong> Gentig as an assembly tool must<br />
be regarded as heuristic, since <strong>the</strong> set <strong>of</strong> input maps is highly selective and not random<br />
as <strong>the</strong> model expects. Fur<strong>the</strong>r, <strong>the</strong> depth <strong>of</strong> <strong>the</strong>se data sets are lower than <strong>the</strong> values<br />
suggested by Anantharaman and Mishra (2003) for reliable assemblies, which may explain<br />
alignment problems involving small fragments. Generally speaking, <strong>the</strong> inherent trade-<strong>of</strong>f<br />
between depth (resources) and accuracy is unavoidable, and low depth may be an acceptable<br />
compromise in many situations. However, a formal study is necessary to quantify this trade<strong>of</strong>f.<br />
Also desirable are formal approaches to develop rules to control false positives, which<br />
are currently derived in an ad hoc fashion. <strong>On</strong>e specific goal is to obtain an estimate for <strong>the</strong><br />
false positive rate, which is not obvious even in <strong>the</strong> case where <strong>the</strong> true genome is known.<br />
General questions: More generally, several algorithmic questions remain unanswered.<br />
The most obvious one is de novo assembly for large genomes, which is required when a draft<br />
sequence is not available. Ano<strong>the</strong>r consideration is <strong>the</strong> analysis <strong>of</strong> heterozygous or mixture<br />
(e.g. cancer) populations. This is particularly important in studying human genomes, which