29.07.2014 Views

On the Analysis of Optical Mapping Data - University of Wisconsin ...

On the Analysis of Optical Mapping Data - University of Wisconsin ...

On the Analysis of Optical Mapping Data - University of Wisconsin ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

23<br />

Alternatively, it can be thought <strong>of</strong> as <strong>the</strong> realization <strong>of</strong> a random process; in particular,<br />

recognition sites along <strong>the</strong> genome have been modeled as <strong>the</strong> realizations <strong>of</strong> a homogeneous<br />

Poisson point process, or equivalently <strong>the</strong> fragment lengths as i.i.d. exponential variates. This<br />

model is supported by Figure 2.1, derived from Build 35 <strong>of</strong> <strong>the</strong> human genome sequence.<br />

The rate <strong>of</strong> this process depends on <strong>the</strong> restriction enzyme being used, as well as <strong>the</strong> genome<br />

being mapped. In some cases, it may vary across, or even within, chromosomes. Genomic<br />

differences within a species usually involve only a fraction <strong>of</strong> <strong>the</strong> genome, and corresponding<br />

restriction maps are expected to be largely similar. In any case, we are chiefly interested in<br />

modeling <strong>the</strong> generative process <strong>of</strong> data conditional on <strong>the</strong> underlying restriction map. It<br />

should be noted that <strong>the</strong> notion <strong>of</strong> a ‘true’ map is somewhat simplified. Diploid genomes<br />

have two versions <strong>of</strong> <strong>the</strong> map, largely similar but not identical. Cancer samples are usually<br />

a mixture <strong>of</strong> several cell populations that each contribute a slightly different genome.<br />

Shotgun breaks: Before <strong>the</strong>y are passed into micro-channels, chromosomal DNA is randomly<br />

broken up into smaller molecules, usually by subjecting <strong>the</strong> DNA to vibration. This<br />

shearing is <strong>of</strong>ten referred to as a whole genome shotgun process. The origin <strong>of</strong> each observed<br />

optical map molecule is characterized by its location in <strong>the</strong> coordinate system defined by <strong>the</strong><br />

underlying true (unknown) restriction map, as well as its length. The distribution <strong>of</strong> <strong>the</strong><br />

location (e.g. midpoint) is assumed to be uniform over <strong>the</strong> underlying genome. It is typical<br />

to consider only optical maps longer than a predetermined threshold, usually 300 Kb. The<br />

distribution <strong>of</strong> lengths <strong>of</strong> <strong>the</strong> filtered maps is usually consistent with a truncated exponential<br />

distribution.<br />

2.1.2 Errors<br />

Cut site errors: A restriction site in <strong>the</strong> true restriction map may fail to show up in a<br />

corresponding optical map. These missing cuts can be due to ei<strong>the</strong>r incomplete digestion<br />

by <strong>the</strong> restriction enzyme or noise in <strong>the</strong> optical map image. Whe<strong>the</strong>r true cut sites are<br />

identified (success) or not (failure) is modeled as independent Bernoulli trials, with some

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!