An Improved Genetic Algorithm for DNA Sequencing - Penn State ...
An Improved Genetic Algorithm for DNA Sequencing - Penn State ... An Improved Genetic Algorithm for DNA Sequencing - Penn State ...
6 EXPERIMENTAL RESULTS 29 The second set of test data we used was generated by the spectrum generator algorithm using the three genomes obtained from the GenBank [13] referenced in Table 2. Table 3 lists all spectrum error combinations, positive and negative, that were used to test the algorithm. For each of the three genomes we tested the GA using 10 different sequences of each length in the set {100, 200, 300, 400, 500, 1000, 2000}. That is, for each genome we tested the algorithm using 70 different sequences. From each of these sequences we used spectra with fragments of length 10 and 20. For each fragment length, 13 different combinations of positive and negative errors were used, ranging from 0 to 20% errors as shown in Table 3. Hence, we tested all three genomes using a total of 3 × 70 × 2 × 13 = 5, 460 spectra. We have also generated a similar set of spectra with fragments of length 50. The algorithm was tested on all spectra of three different lengths: 10, 20, and 50. We observe that the longer the fragments are, the better the results are. In fact, with a fragment length of 50, our algorithm almost always found the optimal answers, and thus, we do not include the data for fragments of length 50 here. We used different fragment lengths since in practice different hybridization techniques may require different fragment lengths. Normally, the hybridization rate is better if the fragment length is longer. However, for in situ hybridization, a small fragment length is required [21]. As in the case of the first data set, we use the Smith-Waterman sequence alignment algorithm to determine the quality of the solutions returned by the algorithm. We used an implementation of the Smith-Waterman algorithm provided by Jie Li of Iowa State University [19]. Figures 12 and 13 show the performance and running time of our algorithm on the second set of data. In Figure 12, the graph shows the match percentage for spectra with fragment length 10. The x-axis shows the various error combinations in the input spectrum. The notation -a+b indicates spectra with a% negative error and %b positive error. Thirteen different error combinations were used to verify the performance of the new algorithm. These error combinations were selected because of many reasons. Previous researches in the same area used similar values for testing. We also wanted to provide error combinations that are
6 EXPERIMENTAL RESULTS 30 uniformly distributed over the range from 0 to 20. We found, by experimental results, that those values would be adequate to illustrate the strength of our new algorithm compared to existing other algorithms. Figure 13 shows the running time for spectra with a fragment length of 10. Figure 12: Plot of Match Percentage against Error Combinations (l=10) Figure 13: Plot of Running Time against Error Combinations (l=10)
- Page 1 and 2: The Pennsylvania State University T
- Page 3 and 4: Table of Contents Abstract Acknowle
- Page 5 and 6: Acknowledgements Dr. T. Bui has bee
- Page 7 and 8: List of Tables 1 The Dynamic Progra
- Page 9 and 10: 2 PRELIMINARIES 2 In this study we
- Page 11 and 12: 2 PRELIMINARIES 4 CTGGTGTCTG, TGGTG
- Page 13 and 14: 2 PRELIMINARIES 6 method in the cas
- Page 15 and 16: 2 PRELIMINARIES 8 Genetic Algorithm
- Page 17 and 18: 3 PROBLEM FORMULATION 10 3.1 Descri
- Page 19 and 20: 3 PROBLEM FORMULATION 12 likely seq
- Page 21 and 22: 3 PROBLEM FORMULATION 14 follows. T
- Page 23 and 24: 3 PROBLEM FORMULATION 16 ing, the n
- Page 25 and 26: 4 ALGORITHM 18 form F 1 ...F k , wh
- Page 27 and 28: 4 ALGORITHM 20 constraint that each
- Page 29 and 30: 4 ALGORITHM 22 Repair(C) // C is a
- Page 31 and 32: 5 STANDARDIZING THE DATA 24 that re
- Page 33 and 34: 5 STANDARDIZING THE DATA 26 Table 3
- Page 35: 6 EXPERIMENTAL RESULTS 28 score. In
- Page 39 and 40: 6 EXPERIMENTAL RESULTS 32 Figure 15
- Page 41 and 42: 6 EXPERIMENTAL RESULTS 34 Figure 18
- Page 43 and 44: 7 CONCLUSION 36 Figure 22 shows tha
- Page 45 and 46: REFERENCES 38 nal of Computational
- Page 47: REFERENCES 40 [23] Pevzner, P. A.,
6 EXPERIMENTAL RESULTS 29<br />
The second set of test data we used was generated by the spectrum generator<br />
algorithm using the three genomes obtained from the GenBank [13]<br />
referenced in Table 2. Table 3 lists all spectrum error combinations, positive<br />
and negative, that were used to test the algorithm. For each of the three<br />
genomes we tested the GA using 10 different sequences of each length in the<br />
set {100, 200, 300, 400, 500, 1000, 2000}. That is, <strong>for</strong> each genome we tested<br />
the algorithm using 70 different sequences. From each of these sequences we<br />
used spectra with fragments of length 10 and 20. For each fragment length,<br />
13 different combinations of positive and negative errors were used, ranging<br />
from 0 to 20% errors as shown in Table 3. Hence, we tested all three genomes<br />
using a total of 3 × 70 × 2 × 13 = 5, 460 spectra.<br />
We have also generated a similar set of spectra with fragments of length<br />
50. The algorithm was tested on all spectra of three different lengths: 10, 20,<br />
and 50. We observe that the longer the fragments are, the better the results<br />
are. In fact, with a fragment length of 50, our algorithm almost always found<br />
the optimal answers, and thus, we do not include the data <strong>for</strong> fragments of<br />
length 50 here. We used different fragment lengths since in practice different<br />
hybridization techniques may require different fragment lengths. Normally,<br />
the hybridization rate is better if the fragment length is longer. However, <strong>for</strong><br />
in situ hybridization, a small fragment length is required [21].<br />
As in the case of the first data set, we use the Smith-Waterman sequence<br />
alignment algorithm to determine the quality of the solutions returned by the<br />
algorithm. We used an implementation of the Smith-Waterman algorithm<br />
provided by Jie Li of Iowa <strong>State</strong> University [19]. Figures 12 and 13 show the<br />
per<strong>for</strong>mance and running time of our algorithm on the second set of data. In<br />
Figure 12, the graph shows the match percentage <strong>for</strong> spectra with fragment<br />
length 10. The x-axis shows the various error combinations in the input spectrum.<br />
The notation -a+b indicates spectra with a% negative error and %b<br />
positive error. Thirteen different error combinations were used to verify the<br />
per<strong>for</strong>mance of the new algorithm. These error combinations were selected<br />
because of many reasons. Previous researches in the same area used similar<br />
values <strong>for</strong> testing. We also wanted to provide error combinations that are