On the Analysis of Optical Mapping Data - University of Wisconsin ...
On the Analysis of Optical Mapping Data - University of Wisconsin ...
On the Analysis of Optical Mapping Data - University of Wisconsin ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
74<br />
states will usually lead to over-fitting: some estimated states may be close to each o<strong>the</strong>r and<br />
rare states may not be identified at all. In practice, diagnostic plots and numerical criteria<br />
like AIC and BIC may assist in making <strong>the</strong> choice. The estimated negative binomial size<br />
parameter may also serve as a guide, since too few states in <strong>the</strong> model will be compensated<br />
for by stronger apparent extra-Poisson variation. External information, when available, may<br />
be used to put fur<strong>the</strong>r constraints on <strong>the</strong> parameters, e.g. fixing <strong>the</strong> mean states up to a<br />
multiplicative constant, which is estimated.<br />
Normal state: The HMM does not assign <strong>the</strong> special label <strong>of</strong>“normality”to any particular<br />
state. However, <strong>the</strong> initial choice <strong>of</strong> intervals are determined to have a specified mean number<br />
<strong>of</strong> hits under <strong>the</strong> hypo<strong>the</strong>sis <strong>of</strong> no copy number changes. In practice, <strong>the</strong> mean <strong>of</strong> <strong>the</strong><br />
stationary distribution <strong>of</strong> <strong>the</strong> estimated HMM is usually close to this initial choice. The<br />
estimated mean state with <strong>the</strong> highest (stationary) probability is usually also <strong>the</strong> one closest<br />
to this initial choice, and can be considered to be <strong>the</strong> normal state. This should work well<br />
unless <strong>the</strong>re is widespread CNP, in which case <strong>the</strong> definition <strong>of</strong> “normal state” is unclear.<br />
Alignment: Alignment <strong>of</strong> optical maps to <strong>the</strong> in silico reference is an important prerequisite<br />
for our analysis. Although we expect and correct for non-uniform sensitivity, <strong>the</strong><br />
alignment scheme should none<strong>the</strong>less be as sensitive as possible with a low false positive rate.<br />
In our examples, we have used <strong>the</strong> SOMA score function (A.1) with default parameters. The<br />
determination <strong>of</strong> significance thresholds is discussed in Chapter 3. All alignments used were<br />
significant at a nominal specificity <strong>of</strong> 99.9%.<br />
Symmetry: High and low coverage are not treated symmetrically, in <strong>the</strong> sense that <strong>the</strong><br />
power to detect lowered coverage may be lower than that to detect elevated coverage <strong>of</strong> <strong>the</strong><br />
same relative magnitude. Thus, it may make sense to reverse <strong>the</strong> roles <strong>of</strong> <strong>the</strong> normal data<br />
and <strong>the</strong> dataset <strong>of</strong> interest, so that low coverage is identified as high mean states in <strong>the</strong><br />
HMM. Of course, <strong>the</strong> relative sizes <strong>of</strong> <strong>the</strong> data sets are also relevant.