On the Analysis of Optical Mapping Data - University of Wisconsin ...

More documents

Recommendations

Info

50 Spurious score Replication 2 70 60 50 40 30 20 20 30 40 50 60 70 Replication 1 Counts 1581 1394 1219 1056 904 764 636 520 415 322 241 172 114 68 34 12 1 Figure 3.7 LR scores for ungapped global alignment, after Valouev et al. (2006). Optimal scores for GM07535 optical maps aligned against two independent permutations of the in silico reference are plotted against each other. Nominal Specificity Score function 99.0% 99.9% SOMA 34.47 26.01 LR 26.09 18.84 Table 3.2 Percentage of GM07535 maps declared as significant by the SOMA and LR scores using the direct approach.
51 The ability to simulate from the null distribution allows us to try out and choose from among various sets of parameters. Note that an appropriate choice may depend on the task; for example, the best score for gapped alignment is always larger than that for ungapped alignment, so the same set of parameters may not be optimal for both. Indeed, it is tempting to try and ‘improve’ the scores we have used in our examples and consider ones other than those reported in Table 3.2; however, we will refrain from doing so since a proper study requires a systematic effort that is beyond the scope of this discussion. 3.4.2 Information measure Location-specific cutoff: It is empirically known that different cutoffs for the SOMA score seem appropriate for alignments to different parts of a reference map, but a formal approach incorporating this idea has been difficult to formulate. Map-specific cutoffs provide a perfectly natural explanation for this observation, since an optical map is largely determined by its origin. However, this does not guard against spurious alignments at similar (homologous) regions in the genome, which are also a potential concern. Information measure: A related construct that proves useful in further understanding optical map score functions is the score obtained by aligning a map with itself, which we henceforth denote by ψ(M). Given a score function, this can be thought of as an information measure for the map: if the map had no errors, this would be the score for the correct alignment. Errors normally reduce the correct alignment score from this perfect score. ψ(M) is of course higher for longer maps, but is also affected by the lengths of the component fragments since most score functions reward matches involving longer fragments, which are rarer. Maps with lower information content are naturally harder to align successfully. However, Figure 3.8 shows that even for maps with high information content, the distributions of spurious and real SOMA scores are not well separated. Simulation: In general, any optical map dataset and score function can be summarized by a plot analogous to Figure 3.8. Figure 3.9 shows such a plot for a set of simulated optical
Page 1 and 2:
ON THE ANALYSIS OF OPTICAL MAPPING
Page 3 and 4:
To my parents. i
Page 5 and 6:
DISCARD THIS PAGE
Page 7 and 8:
iv Page 3.3 Results . . . . . . . .
Page 9 and 10:
v LIST OF TABLES Table Page 1.1 Sum
Page 11 and 12:
vi LIST OF FIGURES Figure Page 1.1
Page 13 and 14: ON THE ANALYSIS OF OPTICAL MAPPING
Page 15 and 16: 1 Chapter 1 Overview of Optical Map
Page 17 and 18: 3 hard, do not always have a unique
Page 19 and 20: 5 for microbial and other small gen
Page 21 and 22: 7 Figure 1.2 Close-up of a typical
Page 23 and 24: 9 0.96 0.98 1.00 1.02 1.04 Offset a
Page 25 and 26: 11 direct glimpse at the underlying
Page 27 and 28: 13 which is not surprising since we
Page 29 and 30: 15 Gapped alignments: The above des
Page 31 and 32: Figure 1.5 A visualization of align
Page 33 and 34: 19 Assembly: For these examples, th
Page 35 and 36: 21 Chapter 2 Modeling Optical Map D
Page 37 and 38: 23 Alternatively, it can be thought
Page 39 and 40: 25 and V (X i ) = E(V (Y i R i |R i
Page 41 and 42: 27 affect inference. If necessary,
Page 43 and 44: 29 Quantiles of fragment lengths (K
Page 45 and 46: 31 as a function of the parameters.
Page 47 and 48: 33 by rejecting maps that do not al
Page 49 and 50: 35 30 0.700 − 0.005 0 50 100 150
Page 51 and 52: 37 Chapter 3 Significance of Optica
Page 53 and 54: 39 using optical mapping data from
Page 55 and 56: 41 1.0 0.8 0.6 0.4 0.2 0.0 1.0 0.8
Page 57 and 58: 43 3.3.2 Simplifications Direct app
Page 59 and 60: 45 Mean spurious score 0 −10 −2
Page 61 and 62: 47 3.3.3 Simulation Given a generat
Page 63: 49 3.4 Discussion 3.4.1 Uses Alignm
Page 67 and 68: 53 maps, where the separation betwe
Page 69 and 70: Figure 3.10 Schematic representatio
Page 71 and 72: 57 especially a short noisy one, to
Page 73 and 74: 59 Test statistics: Variability due
Page 75 and 76: 61 in sequence assembly and validat
Page 77 and 78: 63 1988). However, due to sampling
Page 79 and 80: 65 and rate parameters Λ i = E(N i
Page 81 and 82: 67 with mean µ k for the k th stat
Page 83 and 84: 69 Estimated Copy Number in simulat
Page 85 and 86: 71 Posterior probabilities 1.0 0.8
Page 87 and 88: 73 (a) Observed counts and decoded
Page 89 and 90: 75 Conclusion: Copy number alterati
Page 91 and 92: 77 well in its current form, but th
Page 93 and 94: 79 Change in score 15 10 5 0 0.9 1.
Page 95 and 96: 81 will rarely be homozygous. It ma
Page 97 and 98: 83 E.T. Dimalanta, A. Lim, R. Runnh
Page 99 and 100: 85 Appendix A: Score functions for
Page 101 and 102: 87 Appendix B: Hidden Markov Model
Page 103 and 104: 89 which can be shown to have highe
show all

On the Analysis of Optical Mapping Data - University of Wisconsin ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?