29.07.2014 Views

On the Analysis of Optical Mapping Data - University of Wisconsin ...

On the Analysis of Optical Mapping Data - University of Wisconsin ...

On the Analysis of Optical Mapping Data - University of Wisconsin ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

45<br />

Mean spurious score<br />

0<br />

−10<br />

−20<br />

Number <strong>of</strong> fragments<br />

20 40 60<br />

Length (Kb)<br />

400 600 800 1000<br />

Regression Fit<br />

−30 −20 −10 0<br />

Counts<br />

6630<br />

5837<br />

5094<br />

4402<br />

3760<br />

3169<br />

2628<br />

2138<br />

1698<br />

1309<br />

971<br />

683<br />

445<br />

259<br />

122<br />

36<br />

1<br />

Figure 3.5 Parametric models for µ(M). The average <strong>of</strong> four spurious scores for each map is<br />

plotted against <strong>the</strong> number <strong>of</strong> fragments N, <strong>the</strong> length L, and <strong>the</strong> fitted values from a linear<br />

model with terms N, L and <strong>the</strong>ir product NL. The multiple regression model explains more<br />

<strong>of</strong> <strong>the</strong> variability, and also suggests better symmetry.<br />

3.5 demonstrates <strong>the</strong> utility <strong>of</strong> this approach. As before, a generalized least squares model<br />

with standard deviation linear in <strong>the</strong> fitted values is more appropriate, giving standardized<br />

test statistics<br />

)<br />

S<br />

(˜G|M − ˜µ (M)<br />

T 2 (M) =<br />

̂δ 2 − ˜µ (M)<br />

where ˜µ (M) are <strong>the</strong> fitted responses.<br />

Comparison: Table 3.1 summarizes <strong>the</strong> results from both approaches. Specifically, <strong>the</strong><br />

mean spurious scores for each <strong>of</strong> <strong>the</strong> 206796 GM07535 maps were estimated using n = 4<br />

permutations <strong>of</strong> <strong>the</strong> reference. A fifth permutation was used for parameter estimation: δ 1<br />

in <strong>the</strong> direct approach, δ 2 and <strong>the</strong> regression coefficients in <strong>the</strong> regression approach. A sixth<br />

permutation was used to sample from <strong>the</strong> null distributions, and 99% and 99.9% cut<strong>of</strong>fs<br />

were determined by <strong>the</strong> appropriate quantiles <strong>of</strong> <strong>the</strong>se samples <strong>of</strong> size 206796. The two<br />

approaches largely agree in both cases. For aligning a future map, <strong>the</strong> regression method<br />

is <strong>of</strong> more practical value, as it would require only one alignment to ˜G, whereas <strong>the</strong> direct<br />

method would require additional alignments to several permuted references to estimate µ(M).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!