On the Analysis of Optical Mapping Data - University of Wisconsin ...
On the Analysis of Optical Mapping Data - University of Wisconsin ...
On the Analysis of Optical Mapping Data - University of Wisconsin ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
23<br />
Alternatively, it can be thought <strong>of</strong> as <strong>the</strong> realization <strong>of</strong> a random process; in particular,<br />
recognition sites along <strong>the</strong> genome have been modeled as <strong>the</strong> realizations <strong>of</strong> a homogeneous<br />
Poisson point process, or equivalently <strong>the</strong> fragment lengths as i.i.d. exponential variates. This<br />
model is supported by Figure 2.1, derived from Build 35 <strong>of</strong> <strong>the</strong> human genome sequence.<br />
The rate <strong>of</strong> this process depends on <strong>the</strong> restriction enzyme being used, as well as <strong>the</strong> genome<br />
being mapped. In some cases, it may vary across, or even within, chromosomes. Genomic<br />
differences within a species usually involve only a fraction <strong>of</strong> <strong>the</strong> genome, and corresponding<br />
restriction maps are expected to be largely similar. In any case, we are chiefly interested in<br />
modeling <strong>the</strong> generative process <strong>of</strong> data conditional on <strong>the</strong> underlying restriction map. It<br />
should be noted that <strong>the</strong> notion <strong>of</strong> a ‘true’ map is somewhat simplified. Diploid genomes<br />
have two versions <strong>of</strong> <strong>the</strong> map, largely similar but not identical. Cancer samples are usually<br />
a mixture <strong>of</strong> several cell populations that each contribute a slightly different genome.<br />
Shotgun breaks: Before <strong>the</strong>y are passed into micro-channels, chromosomal DNA is randomly<br />
broken up into smaller molecules, usually by subjecting <strong>the</strong> DNA to vibration. This<br />
shearing is <strong>of</strong>ten referred to as a whole genome shotgun process. The origin <strong>of</strong> each observed<br />
optical map molecule is characterized by its location in <strong>the</strong> coordinate system defined by <strong>the</strong><br />
underlying true (unknown) restriction map, as well as its length. The distribution <strong>of</strong> <strong>the</strong><br />
location (e.g. midpoint) is assumed to be uniform over <strong>the</strong> underlying genome. It is typical<br />
to consider only optical maps longer than a predetermined threshold, usually 300 Kb. The<br />
distribution <strong>of</strong> lengths <strong>of</strong> <strong>the</strong> filtered maps is usually consistent with a truncated exponential<br />
distribution.<br />
2.1.2 Errors<br />
Cut site errors: A restriction site in <strong>the</strong> true restriction map may fail to show up in a<br />
corresponding optical map. These missing cuts can be due to ei<strong>the</strong>r incomplete digestion<br />
by <strong>the</strong> restriction enzyme or noise in <strong>the</strong> optical map image. Whe<strong>the</strong>r true cut sites are<br />
identified (success) or not (failure) is modeled as independent Bernoulli trials, with some