29.07.2014 Views

On the Analysis of Optical Mapping Data - University of Wisconsin ...

On the Analysis of Optical Mapping Data - University of Wisconsin ...

On the Analysis of Optical Mapping Data - University of Wisconsin ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

6<br />

1.3 Elements <strong>of</strong> optical mapping<br />

We now describe in more depth elements <strong>of</strong> a typical optical mapping experiment. We<br />

start with image processing and go on to discuss <strong>the</strong> structure <strong>of</strong> optical map data and<br />

<strong>the</strong> goals and challenges we face in data analysis. We describe two basic computational<br />

tasks, alignment and assembly, that are fundamental in addressing many o<strong>the</strong>r problems.<br />

We end with a summary <strong>of</strong> <strong>the</strong> analysis <strong>of</strong> <strong>the</strong> GM07535 and CHM data sets reported by<br />

Reslewic et al..<br />

1.3.1 Image processing<br />

Intensity pr<strong>of</strong>iles: For a typical optical mapping experiment, hundreds <strong>of</strong> raw images<br />

need to be processed to obtain useful data (Figure 1.2). The first step in this process is to<br />

identify <strong>the</strong> collections <strong>of</strong> pixels in an image that toge<strong>the</strong>r represent a single DNA molecule.<br />

This is a complicated task that falls in <strong>the</strong> domain <strong>of</strong> computer vision and will not be<br />

discussed fur<strong>the</strong>r. The end product <strong>of</strong> this step is an intensity pr<strong>of</strong>ile for each molecule<br />

(Figure 1.3) giving <strong>the</strong> measured fluorescent intensity as a function <strong>of</strong> distance along <strong>the</strong><br />

“backbone”. There are two ways to proceed. We may consider <strong>the</strong>se pr<strong>of</strong>iles as our primary<br />

data, and retain <strong>the</strong> information <strong>the</strong>y contain in subsequent analyses. Alternatively, we may<br />

immediately convert <strong>the</strong>m into putative restriction maps, i.e. to an ordered sequence <strong>of</strong><br />

fragment lengths. The second approach is simpler because it separates <strong>the</strong> problem into two<br />

parts that can be refined independently. Also, many standard techniques in computational<br />

biology apply, with suitable adaptations, in this formulation. The first approach has a certain<br />

appeal, but presents difficult challenges and we do not investigate it fur<strong>the</strong>r. The rest <strong>of</strong> this<br />

discussion assumes <strong>the</strong> second, two-step approach.<br />

Cleavage sites: To convert intensity pr<strong>of</strong>iles to restriction maps, one has to first identify<br />

<strong>the</strong> cleavage sites or cut sites in <strong>the</strong> map, indicated by ‘dips’ in <strong>the</strong> intensity pr<strong>of</strong>ile. The<br />

approaches traditionally used to identify cut sites are largely heuristic, although formal

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!