18.04.2013 Views

The.Algorithm.Design.Manual.Springer-Verlag.1998

The.Algorithm.Design.Manual.Springer-Verlag.1998

The.Algorithm.Design.Manual.Springer-Verlag.1998

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Approximate String Matching<br />

Next: Text Compression Up: Set and String Problems Previous: String Matching<br />

Approximate String Matching<br />

Input description: A text string t and a pattern string p. An edit cost bound k.<br />

Problem description: Can we transform t to p using at most k insertions, deletions, and substitutions?<br />

Discussion: Approximate string matching is fundamental to text processing, because we live in an errorprone<br />

world. Any spelling correction program must be able to identify the closest match for any text<br />

string not found in a dictionary. Genbank has become a fundamental tool for molecular biology by<br />

supporting homology (similarity) searches on DNA sequences. Suppose you were to sequence a new<br />

gene in man, and you discovered that it is similar to the hemoglobin gene in rats. It is likely that this new<br />

gene also produces hemoglobin, and any differences are the result of genetic mutations during evolution.<br />

I once encountered approximate string matching in evaluating the performance of an optical character<br />

recognition system that we built. After scanning and recognizing a test document, we needed to compare<br />

the correct answers with those produced by our system. To improve our system, it was important to count<br />

how often each pair of letters were getting confused and to identify gibberish when the program was<br />

trying to make out letters where none existed. <strong>The</strong> solution was to do an alignment between the two texts.<br />

Insertions and deletions corresponded to gibberish, while substitutions signaled errors in our recognizers.<br />

file:///E|/BOOK/BOOK5/NODE204.HTM (1 of 5) [19/1/2003 1:32:10]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!