On the Analysis of Optical Mapping Data - University of Wisconsin ...
On the Analysis of Optical Mapping Data - University of Wisconsin ...
On the Analysis of Optical Mapping Data - University of Wisconsin ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
6<br />
1.3 Elements <strong>of</strong> optical mapping<br />
We now describe in more depth elements <strong>of</strong> a typical optical mapping experiment. We<br />
start with image processing and go on to discuss <strong>the</strong> structure <strong>of</strong> optical map data and<br />
<strong>the</strong> goals and challenges we face in data analysis. We describe two basic computational<br />
tasks, alignment and assembly, that are fundamental in addressing many o<strong>the</strong>r problems.<br />
We end with a summary <strong>of</strong> <strong>the</strong> analysis <strong>of</strong> <strong>the</strong> GM07535 and CHM data sets reported by<br />
Reslewic et al..<br />
1.3.1 Image processing<br />
Intensity pr<strong>of</strong>iles: For a typical optical mapping experiment, hundreds <strong>of</strong> raw images<br />
need to be processed to obtain useful data (Figure 1.2). The first step in this process is to<br />
identify <strong>the</strong> collections <strong>of</strong> pixels in an image that toge<strong>the</strong>r represent a single DNA molecule.<br />
This is a complicated task that falls in <strong>the</strong> domain <strong>of</strong> computer vision and will not be<br />
discussed fur<strong>the</strong>r. The end product <strong>of</strong> this step is an intensity pr<strong>of</strong>ile for each molecule<br />
(Figure 1.3) giving <strong>the</strong> measured fluorescent intensity as a function <strong>of</strong> distance along <strong>the</strong><br />
“backbone”. There are two ways to proceed. We may consider <strong>the</strong>se pr<strong>of</strong>iles as our primary<br />
data, and retain <strong>the</strong> information <strong>the</strong>y contain in subsequent analyses. Alternatively, we may<br />
immediately convert <strong>the</strong>m into putative restriction maps, i.e. to an ordered sequence <strong>of</strong><br />
fragment lengths. The second approach is simpler because it separates <strong>the</strong> problem into two<br />
parts that can be refined independently. Also, many standard techniques in computational<br />
biology apply, with suitable adaptations, in this formulation. The first approach has a certain<br />
appeal, but presents difficult challenges and we do not investigate it fur<strong>the</strong>r. The rest <strong>of</strong> this<br />
discussion assumes <strong>the</strong> second, two-step approach.<br />
Cleavage sites: To convert intensity pr<strong>of</strong>iles to restriction maps, one has to first identify<br />
<strong>the</strong> cleavage sites or cut sites in <strong>the</strong> map, indicated by ‘dips’ in <strong>the</strong> intensity pr<strong>of</strong>ile. The<br />
approaches traditionally used to identify cut sites are largely heuristic, although formal