29.07.2014 Views

On the Analysis of Optical Mapping Data - University of Wisconsin ...

On the Analysis of Optical Mapping Data - University of Wisconsin ...

On the Analysis of Optical Mapping Data - University of Wisconsin ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

74<br />

states will usually lead to over-fitting: some estimated states may be close to each o<strong>the</strong>r and<br />

rare states may not be identified at all. In practice, diagnostic plots and numerical criteria<br />

like AIC and BIC may assist in making <strong>the</strong> choice. The estimated negative binomial size<br />

parameter may also serve as a guide, since too few states in <strong>the</strong> model will be compensated<br />

for by stronger apparent extra-Poisson variation. External information, when available, may<br />

be used to put fur<strong>the</strong>r constraints on <strong>the</strong> parameters, e.g. fixing <strong>the</strong> mean states up to a<br />

multiplicative constant, which is estimated.<br />

Normal state: The HMM does not assign <strong>the</strong> special label <strong>of</strong>“normality”to any particular<br />

state. However, <strong>the</strong> initial choice <strong>of</strong> intervals are determined to have a specified mean number<br />

<strong>of</strong> hits under <strong>the</strong> hypo<strong>the</strong>sis <strong>of</strong> no copy number changes. In practice, <strong>the</strong> mean <strong>of</strong> <strong>the</strong><br />

stationary distribution <strong>of</strong> <strong>the</strong> estimated HMM is usually close to this initial choice. The<br />

estimated mean state with <strong>the</strong> highest (stationary) probability is usually also <strong>the</strong> one closest<br />

to this initial choice, and can be considered to be <strong>the</strong> normal state. This should work well<br />

unless <strong>the</strong>re is widespread CNP, in which case <strong>the</strong> definition <strong>of</strong> “normal state” is unclear.<br />

Alignment: Alignment <strong>of</strong> optical maps to <strong>the</strong> in silico reference is an important prerequisite<br />

for our analysis. Although we expect and correct for non-uniform sensitivity, <strong>the</strong><br />

alignment scheme should none<strong>the</strong>less be as sensitive as possible with a low false positive rate.<br />

In our examples, we have used <strong>the</strong> SOMA score function (A.1) with default parameters. The<br />

determination <strong>of</strong> significance thresholds is discussed in Chapter 3. All alignments used were<br />

significant at a nominal specificity <strong>of</strong> 99.9%.<br />

Symmetry: High and low coverage are not treated symmetrically, in <strong>the</strong> sense that <strong>the</strong><br />

power to detect lowered coverage may be lower than that to detect elevated coverage <strong>of</strong> <strong>the</strong><br />

same relative magnitude. Thus, it may make sense to reverse <strong>the</strong> roles <strong>of</strong> <strong>the</strong> normal data<br />

and <strong>the</strong> dataset <strong>of</strong> interest, so that low coverage is identified as high mean states in <strong>the</strong><br />

HMM. Of course, <strong>the</strong> relative sizes <strong>of</strong> <strong>the</strong> data sets are also relevant.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!