29.07.2014 Views

On the Analysis of Optical Mapping Data - University of Wisconsin ...

On the Analysis of Optical Mapping Data - University of Wisconsin ...

On the Analysis of Optical Mapping Data - University of Wisconsin ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

80<br />

5.2 O<strong>the</strong>r topics<br />

Preliminary filtering: The optical mapping pipeline processes massive amounts <strong>of</strong> data.<br />

This process is helped if maps that are unlikely to contribute to <strong>the</strong> results are filtered<br />

out beforehand. It is currently standard practice to leave out maps that do not exceed a<br />

minimum length threshold, usually 300 Kb. However, length <strong>of</strong> an optical map does not<br />

by itself determine <strong>the</strong> usefulness <strong>of</strong> a map. As suggested in Section 3.4.2, one alternative<br />

is to use a map-specific information score, derived from alignment score functions, for this<br />

purpose. A similar but more difficult line <strong>of</strong> research would be to develop a map-specific<br />

confidence score as part <strong>of</strong> <strong>the</strong> image processing step, to reflect <strong>the</strong> likely quality <strong>of</strong> <strong>the</strong> map.<br />

Iterative assembly: The iterative assembly scheme described by Reslewic et al. has been<br />

successful in assembling large genomes by taking advantage <strong>of</strong> an initial reference map. Being<br />

a first attempt, <strong>the</strong>re is naturally scope for improvement. Although <strong>the</strong> results have been<br />

validated to some extent (e.g. using PCR), <strong>the</strong> use <strong>of</strong> Gentig as an assembly tool must<br />

be regarded as heuristic, since <strong>the</strong> set <strong>of</strong> input maps is highly selective and not random<br />

as <strong>the</strong> model expects. Fur<strong>the</strong>r, <strong>the</strong> depth <strong>of</strong> <strong>the</strong>se data sets are lower than <strong>the</strong> values<br />

suggested by Anantharaman and Mishra (2003) for reliable assemblies, which may explain<br />

alignment problems involving small fragments. Generally speaking, <strong>the</strong> inherent trade-<strong>of</strong>f<br />

between depth (resources) and accuracy is unavoidable, and low depth may be an acceptable<br />

compromise in many situations. However, a formal study is necessary to quantify this trade<strong>of</strong>f.<br />

Also desirable are formal approaches to develop rules to control false positives, which<br />

are currently derived in an ad hoc fashion. <strong>On</strong>e specific goal is to obtain an estimate for <strong>the</strong><br />

false positive rate, which is not obvious even in <strong>the</strong> case where <strong>the</strong> true genome is known.<br />

General questions: More generally, several algorithmic questions remain unanswered.<br />

The most obvious one is de novo assembly for large genomes, which is required when a draft<br />

sequence is not available. Ano<strong>the</strong>r consideration is <strong>the</strong> analysis <strong>of</strong> heterozygous or mixture<br />

(e.g. cancer) populations. This is particularly important in studying human genomes, which

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!