18.04.2013 Views

The.Algorithm.Design.Manual.Springer-Verlag.1998

The.Algorithm.Design.Manual.Springer-Verlag.1998

The.Algorithm.Design.Manual.Springer-Verlag.1998

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Suffix Trees and Arrays<br />

Figure: A trie on strings the, their, there, was, and when<br />

Tries are useful for testing whether a given query string q is in the set. Starting with the first character<br />

of q, we traverse the trie along the branch defined by the next character of q. If this branch does not exist<br />

in the trie, then q cannot be one of the set of strings. Otherwise we find q in |q| character comparisons<br />

regardless of how many strings are in the trie. Tries are very simple to build (repeatedly insert new<br />

strings) and very fast to search, although they can be expensive in terms of memory.<br />

A suffix tree is simply a trie of all the proper suffixes of S. <strong>The</strong> suffix tree enables you to quickly test<br />

whether q is a substring of S, because any substring of S is the prefix of some suffix (got it?). <strong>The</strong> search<br />

time is again linear in the length of q.<br />

<strong>The</strong> catch is that constructing a full suffix tree in this manner can require time and, even worse,<br />

space, since the average length of the n suffices is n/2 and there is likely to be relatively little<br />

overlap representing shared prefixes. However, linear space suffices to represent a full suffix tree by<br />

being clever. Observe that most of the nodes in a trie-based suffix tree occur on simple paths between<br />

branch nodes in the tree. Each of these simple paths corresponds to a substring of the original string. By<br />

storing the original string in an array and collapsing each such path into a single node described by the<br />

starting and ending array indices representing the substring, we have all the information of the full suffix<br />

tree in only O(n) space. <strong>The</strong> output figure for this section displays a collapsed suffix tree in all its glory.<br />

Even better, there exist linear-time algorithms to construct this collapsed tree that make clever use of<br />

pointers to minimize construction time. <strong>The</strong> additional pointers used to facilitate construction can also be<br />

used to speed up many applications of suffix trees.<br />

But what can you do with suffix trees? Consider the following applications. For more details see the<br />

books by Gusfield [Gus97] or Crochemore and Rytter [CR94]:<br />

file:///E|/BOOK/BOOK3/NODE131.HTM (2 of 4) [19/1/2003 1:30:07]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!