29.07.2014 Views

On the Analysis of Optical Mapping Data - University of Wisconsin ...

On the Analysis of Optical Mapping Data - University of Wisconsin ...

On the Analysis of Optical Mapping Data - University of Wisconsin ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

19<br />

Assembly: For <strong>the</strong>se examples, <strong>the</strong> assembly problem was approached using a two-step<br />

procedure. In <strong>the</strong> first step, each individual optical map was aligned to <strong>the</strong> reference map.<br />

The reference genome was <strong>the</strong>n tiled by overlapping “windows” and maps that aligned were<br />

grouped toge<strong>the</strong>r according to membership in <strong>the</strong>se windows. In <strong>the</strong> second step, <strong>the</strong> maps<br />

in each group were assembled using Gentig, giving a local snapshot <strong>of</strong> <strong>the</strong> target map. This<br />

strategy can be expected to work in regions where <strong>the</strong> differences are minor, and use <strong>of</strong><br />

gapped alignments can reveal certain larger-scale variations. For regions <strong>of</strong> more severe<br />

differences, an initial consensus map can be extended into its flanks by iteratively aligning<br />

optical maps to it, allowing partial overlap at <strong>the</strong> boundaries, followed by assembly. This<br />

procedure is revisited in Section 3.3.4.<br />

Differences: The next task was to identify <strong>the</strong> differences between <strong>the</strong> assembled consensus<br />

maps (contigs) and <strong>the</strong> reference map. <strong>On</strong>ce again, this was approached in two steps,<br />

starting with alignments <strong>of</strong> <strong>the</strong> consensus contigs to <strong>the</strong> reference. This induces inferred<br />

alignments <strong>of</strong> single optical maps to <strong>the</strong> reference. Individual differences between <strong>the</strong> assembled<br />

consensus and <strong>the</strong> reference, specifically in restriction sites and fragment lengths,<br />

can <strong>the</strong>n be assigned confidence in <strong>the</strong> form <strong>of</strong> p-values <strong>of</strong> simple hypo<strong>the</strong>sis tests. In practice,<br />

<strong>the</strong> initial alignment is <strong>of</strong>ten problematic in regions with small fragments, and some<br />

automated and manual curation is currently required. Larger indels and translocations are<br />

usually identified manually. Table 1.2 summarizes <strong>the</strong> structural variations identified in <strong>the</strong><br />

CHM and GM07535 genomes. See Reslewic et al. for more details <strong>of</strong> <strong>the</strong> analysis.<br />

Genome Insertions Deletions Extra cuts Missing cuts O<strong>the</strong>rs<br />

CHM 221 217 449 466 14<br />

GM07535 109 52 132 254 10<br />

Table 1.2 Summary <strong>of</strong> “<strong>Optical</strong> Structural Variations” (OSV) identified in <strong>the</strong> CHM and<br />

GM07535 data sets. The events included are those that were significant at a nominal False<br />

Discovery Rate <strong>of</strong> 90%.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!