On the Analysis of Optical Mapping Data - University of Wisconsin ...
On the Analysis of Optical Mapping Data - University of Wisconsin ...
On the Analysis of Optical Mapping Data - University of Wisconsin ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
72<br />
Example: MCF-7: A more interesting example is provided by <strong>the</strong> MCF-7 breast cancer<br />
cell line, which is known to exhibit copy number polymorphism (Pollack et al., 2002;<br />
Volik et al., 2003). Figure 4.5 shows results for chromosomes 17 and 20, which have been<br />
shown by o<strong>the</strong>r methods to contain short regions <strong>of</strong> strongly elevated copy number. 10540<br />
MCF-7 optical maps align to <strong>the</strong>se chromosomes. The combined alignments from GM07535<br />
and CHM, totalling 7532, are used as <strong>the</strong> normalizing data set. For comparison, <strong>the</strong> figure<br />
also shows data from <strong>the</strong> same cell line obtained using a completely different technology for<br />
measuring array-based comparative genomic hybridization (CGH), namely <strong>the</strong> Affymetrix<br />
Xba 50k chip (data provided by Pr<strong>of</strong>. Paul Lizardi, Yale). A visual comparison shows that<br />
<strong>the</strong> unsmoo<strong>the</strong>d coverage counts track well with <strong>the</strong> raw CGH data. After smoothing using<br />
<strong>the</strong> HMM for optical map data and <strong>the</strong> CBS algorithm (Olshen and Venkatraman, 2002) for<br />
<strong>the</strong> Affymetrix data, <strong>the</strong> copy number estimates are also similar. Both methods detect <strong>the</strong><br />
short but prominent spikes in copy number.<br />
4.4 Discussion<br />
In this chapter, we have outlined an approach that uses optical map alignments to detect<br />
copy number alterations in a genome. A key step in this process was to summarize alignments<br />
<strong>of</strong> optical maps to an in silico reference by a single number (midpoint) representing location.<br />
These locations were modeled as realizations <strong>of</strong> a non-homogeneous Poisson process. We<br />
account for <strong>the</strong> non-homogeneity using alignment data from a normal genome, which are used<br />
to define random intervals with counts that follow a negative binomial distribution. These<br />
counts were <strong>the</strong>n modeled by a Hidden Markov Model, incorporating spatial dependence in<br />
<strong>the</strong> data and allowing more natural estimation <strong>of</strong> certain parameters.<br />
Model building: In HMM’s, as in many o<strong>the</strong>r statistical techniques, model selection and<br />
in particular <strong>the</strong> choice <strong>of</strong> <strong>the</strong> number <strong>of</strong> states, is more an art than a science. A small<br />
number <strong>of</strong> states may not completely reflect reality, especially if <strong>the</strong> genome being studied is<br />
a mixture, which would not be unusual in tumor samples. <strong>On</strong> <strong>the</strong> o<strong>the</strong>r hand, allowing many