29.07.2014 Views

On the Analysis of Optical Mapping Data - University of Wisconsin ...

On the Analysis of Optical Mapping Data - University of Wisconsin ...

On the Analysis of Optical Mapping Data - University of Wisconsin ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

72<br />

Example: MCF-7: A more interesting example is provided by <strong>the</strong> MCF-7 breast cancer<br />

cell line, which is known to exhibit copy number polymorphism (Pollack et al., 2002;<br />

Volik et al., 2003). Figure 4.5 shows results for chromosomes 17 and 20, which have been<br />

shown by o<strong>the</strong>r methods to contain short regions <strong>of</strong> strongly elevated copy number. 10540<br />

MCF-7 optical maps align to <strong>the</strong>se chromosomes. The combined alignments from GM07535<br />

and CHM, totalling 7532, are used as <strong>the</strong> normalizing data set. For comparison, <strong>the</strong> figure<br />

also shows data from <strong>the</strong> same cell line obtained using a completely different technology for<br />

measuring array-based comparative genomic hybridization (CGH), namely <strong>the</strong> Affymetrix<br />

Xba 50k chip (data provided by Pr<strong>of</strong>. Paul Lizardi, Yale). A visual comparison shows that<br />

<strong>the</strong> unsmoo<strong>the</strong>d coverage counts track well with <strong>the</strong> raw CGH data. After smoothing using<br />

<strong>the</strong> HMM for optical map data and <strong>the</strong> CBS algorithm (Olshen and Venkatraman, 2002) for<br />

<strong>the</strong> Affymetrix data, <strong>the</strong> copy number estimates are also similar. Both methods detect <strong>the</strong><br />

short but prominent spikes in copy number.<br />

4.4 Discussion<br />

In this chapter, we have outlined an approach that uses optical map alignments to detect<br />

copy number alterations in a genome. A key step in this process was to summarize alignments<br />

<strong>of</strong> optical maps to an in silico reference by a single number (midpoint) representing location.<br />

These locations were modeled as realizations <strong>of</strong> a non-homogeneous Poisson process. We<br />

account for <strong>the</strong> non-homogeneity using alignment data from a normal genome, which are used<br />

to define random intervals with counts that follow a negative binomial distribution. These<br />

counts were <strong>the</strong>n modeled by a Hidden Markov Model, incorporating spatial dependence in<br />

<strong>the</strong> data and allowing more natural estimation <strong>of</strong> certain parameters.<br />

Model building: In HMM’s, as in many o<strong>the</strong>r statistical techniques, model selection and<br />

in particular <strong>the</strong> choice <strong>of</strong> <strong>the</strong> number <strong>of</strong> states, is more an art than a science. A small<br />

number <strong>of</strong> states may not completely reflect reality, especially if <strong>the</strong> genome being studied is<br />

a mixture, which would not be unusual in tumor samples. <strong>On</strong> <strong>the</strong> o<strong>the</strong>r hand, allowing many

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!