On the Analysis of Optical Mapping Data - University of Wisconsin ...

More documents

Recommendations

Info

78 −2 0 2 Observed Quantiles 4 2 0 −2 −4 Raw errors Average errors −2 0 2 Quantiles of Standard Normal Figure 5.1 Correlation of sizing errors within map. The first panel is a normal Q-Q plot of standardized errors, using one randomly chosen ǫ i for each map. The second panel is a Q-Q plot of the standardized mean errors ¯ǫ. If within-map errors are uncorrelated, the ¯ǫ’s and ǫ i ’s should have the same variance. This is clearly not true. (i = 1, . . .,n) for an optical map with n fragments. Each ǫ i should have mean 0 and unit variance. We can also define the standardized averages ¯ǫ = √ 1 n∑ n If the ǫ i ’s are independent, ¯ǫ should also have mean 0 and variance 1. If on the other hand the errors within a map are positively correlated, then V (¯ǫ) > 1. This would be true if scale errors are spatially correlated, i.e., nearby fragments are undersized or oversized together. Using significant alignments of GM07535 optical maps, Figure 5.1 shows normal Q-Q plots of the raw errors along with the standardized average errors ¯ǫ, suggesting that within-map sizing errors are indeed correlated. i=1 ǫ i Accounting for correlation: Additive scores assume independence of fragments and do not allow for correlation, but it is still possible to indirectly account for it. One approach we describe here is based on the premise that for an optical map with consistently undersized or oversized fragments, the correct alignment will often be among the top scoring alignments even if it does not exceed the threshold for significance. Assume that all fragments in a map share a common estimated scale R. Given a potential alignment, one can estimate R; e.g. if an alignment consists of pairs of aligned chunks (µ i , X i ), i = 1, . . ., m, a possible estimate is ̂R = ∑ X i / ∑ µ i . Other perhaps more robust estimates are also possible. The estimate
79 Change in score 15 10 5 0 0.9 1.0 1.1 1.2 1.3 Estimated scale Counts 1980 1573 1213 900 633 413 240 114 34 1 Figure 5.2 Improvement in optimal alignment score plotted against the scale estimated from preliminary alignment. The scatter plot is summarized using hexagonal binning (Carr et al., 1987), with a LOESS smooth (Cleveland and Grosse, 1991) added. ̂R can be used to rescale the original map and obtain an updated alignment score, and the alignment declared significant if the new score exceeds the significance threshold. Certain dynamic programming algorithms, including the one implemented in SOMA, allow detection of multiple alignments, so this procedure need not be restricted to only the top-scoring alignment. Results: As a proof of concept, this process was applied to ungapped global alignment of the GM07535 optical map data using the SOMA score. 24.36% of the maps had at least one significant alignment. To compensate for correlation, we further considered the 5 best scoring alignments of each map regardless of significance. For each of these, the computed ̂R was used to rescale the map and obtain an updated alignment score. This yielded significant alignments for a further 4.76% of the maps. This is the fraction of the total number of maps; the relative increase is a more substantial 19.54%. Even for alignments declared to be significant without rescaling, the updated score is often larger, assigning more confidence to the alignment (Figure 5.2). Some of the additional alignments are naturally spurious; however, this rate is small and can be controlled by suitably modifying the significance threshold. However, to further explore the practical utility of this approach, it must first be incorporated into the standard alignment software.
Page 1 and 2:
ON THE ANALYSIS OF OPTICAL MAPPING
Page 3 and 4:
To my parents. i
Page 5 and 6:
DISCARD THIS PAGE
Page 7 and 8:
iv Page 3.3 Results . . . . . . . .
Page 9 and 10:
v LIST OF TABLES Table Page 1.1 Sum
Page 11 and 12:
vi LIST OF FIGURES Figure Page 1.1
Page 13 and 14:
ON THE ANALYSIS OF OPTICAL MAPPING
Page 15 and 16:
1 Chapter 1 Overview of Optical Map
Page 17 and 18:
3 hard, do not always have a unique
Page 19 and 20:
5 for microbial and other small gen
Page 21 and 22:
7 Figure 1.2 Close-up of a typical
Page 23 and 24:
9 0.96 0.98 1.00 1.02 1.04 Offset a
Page 25 and 26:
11 direct glimpse at the underlying
Page 27 and 28:
13 which is not surprising since we
Page 29 and 30:
15 Gapped alignments: The above des
Page 31 and 32:
Figure 1.5 A visualization of align
Page 33 and 34:
19 Assembly: For these examples, th
Page 35 and 36:
21 Chapter 2 Modeling Optical Map D
Page 37 and 38:
23 Alternatively, it can be thought
Page 39 and 40:
25 and V (X i ) = E(V (Y i R i |R i
Page 41 and 42: 27 affect inference. If necessary,
Page 43 and 44: 29 Quantiles of fragment lengths (K
Page 45 and 46: 31 as a function of the parameters.
Page 47 and 48: 33 by rejecting maps that do not al
Page 49 and 50: 35 30 0.700 − 0.005 0 50 100 150
Page 51 and 52: 37 Chapter 3 Significance of Optica
Page 53 and 54: 39 using optical mapping data from
Page 55 and 56: 41 1.0 0.8 0.6 0.4 0.2 0.0 1.0 0.8
Page 57 and 58: 43 3.3.2 Simplifications Direct app
Page 59 and 60: 45 Mean spurious score 0 −10 −2
Page 61 and 62: 47 3.3.3 Simulation Given a generat
Page 63 and 64: 49 3.4 Discussion 3.4.1 Uses Alignm
Page 65 and 66: 51 The ability to simulate from the
Page 67 and 68: 53 maps, where the separation betwe
Page 69 and 70: Figure 3.10 Schematic representatio
Page 71 and 72: 57 especially a short noisy one, to
Page 73 and 74: 59 Test statistics: Variability due
Page 75 and 76: 61 in sequence assembly and validat
Page 77 and 78: 63 1988). However, due to sampling
Page 79 and 80: 65 and rate parameters Λ i = E(N i
Page 81 and 82: 67 with mean µ k for the k th stat
Page 83 and 84: 69 Estimated Copy Number in simulat
Page 85 and 86: 71 Posterior probabilities 1.0 0.8
Page 87 and 88: 73 (a) Observed counts and decoded
Page 89 and 90: 75 Conclusion: Copy number alterati
Page 91: 77 well in its current form, but th
Page 95 and 96: 81 will rarely be homozygous. It ma
Page 97 and 98: 83 E.T. Dimalanta, A. Lim, R. Runnh
Page 99 and 100: 85 Appendix A: Score functions for
Page 101 and 102: 87 Appendix B: Hidden Markov Model
Page 103 and 104: 89 which can be shown to have highe
show all

On the Analysis of Optical Mapping Data - University of Wisconsin ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?