On the Analysis of Optical Mapping Data - University of Wisconsin ...
On the Analysis of Optical Mapping Data - University of Wisconsin ...
On the Analysis of Optical Mapping Data - University of Wisconsin ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
19<br />
Assembly: For <strong>the</strong>se examples, <strong>the</strong> assembly problem was approached using a two-step<br />
procedure. In <strong>the</strong> first step, each individual optical map was aligned to <strong>the</strong> reference map.<br />
The reference genome was <strong>the</strong>n tiled by overlapping “windows” and maps that aligned were<br />
grouped toge<strong>the</strong>r according to membership in <strong>the</strong>se windows. In <strong>the</strong> second step, <strong>the</strong> maps<br />
in each group were assembled using Gentig, giving a local snapshot <strong>of</strong> <strong>the</strong> target map. This<br />
strategy can be expected to work in regions where <strong>the</strong> differences are minor, and use <strong>of</strong><br />
gapped alignments can reveal certain larger-scale variations. For regions <strong>of</strong> more severe<br />
differences, an initial consensus map can be extended into its flanks by iteratively aligning<br />
optical maps to it, allowing partial overlap at <strong>the</strong> boundaries, followed by assembly. This<br />
procedure is revisited in Section 3.3.4.<br />
Differences: The next task was to identify <strong>the</strong> differences between <strong>the</strong> assembled consensus<br />
maps (contigs) and <strong>the</strong> reference map. <strong>On</strong>ce again, this was approached in two steps,<br />
starting with alignments <strong>of</strong> <strong>the</strong> consensus contigs to <strong>the</strong> reference. This induces inferred<br />
alignments <strong>of</strong> single optical maps to <strong>the</strong> reference. Individual differences between <strong>the</strong> assembled<br />
consensus and <strong>the</strong> reference, specifically in restriction sites and fragment lengths,<br />
can <strong>the</strong>n be assigned confidence in <strong>the</strong> form <strong>of</strong> p-values <strong>of</strong> simple hypo<strong>the</strong>sis tests. In practice,<br />
<strong>the</strong> initial alignment is <strong>of</strong>ten problematic in regions with small fragments, and some<br />
automated and manual curation is currently required. Larger indels and translocations are<br />
usually identified manually. Table 1.2 summarizes <strong>the</strong> structural variations identified in <strong>the</strong><br />
CHM and GM07535 genomes. See Reslewic et al. for more details <strong>of</strong> <strong>the</strong> analysis.<br />
Genome Insertions Deletions Extra cuts Missing cuts O<strong>the</strong>rs<br />
CHM 221 217 449 466 14<br />
GM07535 109 52 132 254 10<br />
Table 1.2 Summary <strong>of</strong> “<strong>Optical</strong> Structural Variations” (OSV) identified in <strong>the</strong> CHM and<br />
GM07535 data sets. The events included are those that were significant at a nominal False<br />
Discovery Rate <strong>of</strong> 90%.