An Improved Genetic Algorithm for DNA Sequencing - Penn State ...

An Improved Genetic Algorithm for DNA Sequencing - Penn State ... An Improved Genetic Algorithm for DNA Sequencing - Penn State ...

cs.hbg.psu.edu
from cs.hbg.psu.edu More from this publisher
01.06.2015 Views

6 EXPERIMENTAL RESULTS 29 The second set of test data we used was generated by the spectrum generator algorithm using the three genomes obtained from the GenBank [13] referenced in Table 2. Table 3 lists all spectrum error combinations, positive and negative, that were used to test the algorithm. For each of the three genomes we tested the GA using 10 different sequences of each length in the set {100, 200, 300, 400, 500, 1000, 2000}. That is, for each genome we tested the algorithm using 70 different sequences. From each of these sequences we used spectra with fragments of length 10 and 20. For each fragment length, 13 different combinations of positive and negative errors were used, ranging from 0 to 20% errors as shown in Table 3. Hence, we tested all three genomes using a total of 3 × 70 × 2 × 13 = 5, 460 spectra. We have also generated a similar set of spectra with fragments of length 50. The algorithm was tested on all spectra of three different lengths: 10, 20, and 50. We observe that the longer the fragments are, the better the results are. In fact, with a fragment length of 50, our algorithm almost always found the optimal answers, and thus, we do not include the data for fragments of length 50 here. We used different fragment lengths since in practice different hybridization techniques may require different fragment lengths. Normally, the hybridization rate is better if the fragment length is longer. However, for in situ hybridization, a small fragment length is required [21]. As in the case of the first data set, we use the Smith-Waterman sequence alignment algorithm to determine the quality of the solutions returned by the algorithm. We used an implementation of the Smith-Waterman algorithm provided by Jie Li of Iowa State University [19]. Figures 12 and 13 show the performance and running time of our algorithm on the second set of data. In Figure 12, the graph shows the match percentage for spectra with fragment length 10. The x-axis shows the various error combinations in the input spectrum. The notation -a+b indicates spectra with a% negative error and %b positive error. Thirteen different error combinations were used to verify the performance of the new algorithm. These error combinations were selected because of many reasons. Previous researches in the same area used similar values for testing. We also wanted to provide error combinations that are

6 EXPERIMENTAL RESULTS 30 uniformly distributed over the range from 0 to 20. We found, by experimental results, that those values would be adequate to illustrate the strength of our new algorithm compared to existing other algorithms. Figure 13 shows the running time for spectra with a fragment length of 10. Figure 12: Plot of Match Percentage against Error Combinations (l=10) Figure 13: Plot of Running Time against Error Combinations (l=10)

6 EXPERIMENTAL RESULTS 29<br />

The second set of test data we used was generated by the spectrum generator<br />

algorithm using the three genomes obtained from the GenBank [13]<br />

referenced in Table 2. Table 3 lists all spectrum error combinations, positive<br />

and negative, that were used to test the algorithm. For each of the three<br />

genomes we tested the GA using 10 different sequences of each length in the<br />

set {100, 200, 300, 400, 500, 1000, 2000}. That is, <strong>for</strong> each genome we tested<br />

the algorithm using 70 different sequences. From each of these sequences we<br />

used spectra with fragments of length 10 and 20. For each fragment length,<br />

13 different combinations of positive and negative errors were used, ranging<br />

from 0 to 20% errors as shown in Table 3. Hence, we tested all three genomes<br />

using a total of 3 × 70 × 2 × 13 = 5, 460 spectra.<br />

We have also generated a similar set of spectra with fragments of length<br />

50. The algorithm was tested on all spectra of three different lengths: 10, 20,<br />

and 50. We observe that the longer the fragments are, the better the results<br />

are. In fact, with a fragment length of 50, our algorithm almost always found<br />

the optimal answers, and thus, we do not include the data <strong>for</strong> fragments of<br />

length 50 here. We used different fragment lengths since in practice different<br />

hybridization techniques may require different fragment lengths. Normally,<br />

the hybridization rate is better if the fragment length is longer. However, <strong>for</strong><br />

in situ hybridization, a small fragment length is required [21].<br />

As in the case of the first data set, we use the Smith-Waterman sequence<br />

alignment algorithm to determine the quality of the solutions returned by the<br />

algorithm. We used an implementation of the Smith-Waterman algorithm<br />

provided by Jie Li of Iowa <strong>State</strong> University [19]. Figures 12 and 13 show the<br />

per<strong>for</strong>mance and running time of our algorithm on the second set of data. In<br />

Figure 12, the graph shows the match percentage <strong>for</strong> spectra with fragment<br />

length 10. The x-axis shows the various error combinations in the input spectrum.<br />

The notation -a+b indicates spectra with a% negative error and %b<br />

positive error. Thirteen different error combinations were used to verify the<br />

per<strong>for</strong>mance of the new algorithm. These error combinations were selected<br />

because of many reasons. Previous researches in the same area used similar<br />

values <strong>for</strong> testing. We also wanted to provide error combinations that are

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!