The.Algorithm.Design.Manual.Springer-Verlag.1998
The.Algorithm.Design.Manual.Springer-Verlag.1998 The.Algorithm.Design.Manual.Springer-Verlag.1998
Suffix Trees and Arrays Next: Graph Data Structures Up: Data Structures Previous: Priority Queues Suffix Trees and Arrays Input description: A reference string S. Problem description: Build a data structure for quickly finding all places where an arbitrary query string q is a substring of S. Discussion: Suffix trees and arrays are phenomenally useful data structures for solving string problems efficiently and with elegance. If you need to speed up a string processing algorithm from to linear time, proper use of suffix trees is quite likely the answer. Indeed, suffix trees are the hero of the war story reported in Section . In its simplest instantiation, a suffix tree is simply a trie of the n strings that are suffixes of an n-character string S. A trie is a tree structure, where each node represents one character, and the root represents the null string. Thus each path from the root represents a string, described by the characters labeling the nodes traversed. Any finite set of words defines a trie, and two words with common prefixes will branch off from each other at the first distinguishing character. Each leaf represents the end of a string. Figure illustrates a simple trie. file:///E|/BOOK/BOOK3/NODE131.HTM (1 of 4) [19/1/2003 1:30:07]
Suffix Trees and Arrays Figure: A trie on strings the, their, there, was, and when Tries are useful for testing whether a given query string q is in the set. Starting with the first character of q, we traverse the trie along the branch defined by the next character of q. If this branch does not exist in the trie, then q cannot be one of the set of strings. Otherwise we find q in |q| character comparisons regardless of how many strings are in the trie. Tries are very simple to build (repeatedly insert new strings) and very fast to search, although they can be expensive in terms of memory. A suffix tree is simply a trie of all the proper suffixes of S. The suffix tree enables you to quickly test whether q is a substring of S, because any substring of S is the prefix of some suffix (got it?). The search time is again linear in the length of q. The catch is that constructing a full suffix tree in this manner can require time and, even worse, space, since the average length of the n suffices is n/2 and there is likely to be relatively little overlap representing shared prefixes. However, linear space suffices to represent a full suffix tree by being clever. Observe that most of the nodes in a trie-based suffix tree occur on simple paths between branch nodes in the tree. Each of these simple paths corresponds to a substring of the original string. By storing the original string in an array and collapsing each such path into a single node described by the starting and ending array indices representing the substring, we have all the information of the full suffix tree in only O(n) space. The output figure for this section displays a collapsed suffix tree in all its glory. Even better, there exist linear-time algorithms to construct this collapsed tree that make clever use of pointers to minimize construction time. The additional pointers used to facilitate construction can also be used to speed up many applications of suffix trees. But what can you do with suffix trees? Consider the following applications. For more details see the books by Gusfield [Gus97] or Crochemore and Rytter [CR94]: file:///E|/BOOK/BOOK3/NODE131.HTM (2 of 4) [19/1/2003 1:30:07]
- Page 305 and 306: Problems and Reductions Next: Simpl
- Page 307 and 308: Simple Reductions Next: Hamiltonian
- Page 309 and 310: Hamiltonian Cycles Next: Independen
- Page 311 and 312: Independent Set and Vertex Cover pr
- Page 313 and 314: Clique and Independent Set These la
- Page 315 and 316: Satisfiability Mon Jun 2 23:33:50 E
- Page 317 and 318: The Theory of NP-Completeness Next:
- Page 319 and 320: 3-Satisfiability where for , , , an
- Page 321 and 322: Integer Programming Next: Vertex Co
- Page 323 and 324: Integer Programming possible IP ins
- Page 325 and 326: Vertex Cover reduction for the 3-SA
- Page 327 and 328: Other NP-Complete Problems hard. Th
- Page 329 and 330: The Art of Proving Hardness easiest
- Page 331 and 332: War Story: Hard Against the Clock N
- Page 333 and 334: War Story: Hard Against the Clock I
- Page 335 and 336: Approximation Algorithms Next: Appr
- Page 337 and 338: Approximating Vertex Cover Next: Th
- Page 339 and 340: The Euclidean Traveling Salesman Ne
- Page 341 and 342: The Euclidean Traveling Salesman Ne
- Page 343 and 344: Exercises 1. Prove that the low deg
- Page 345 and 346: Data Structures Next: Dictionaries
- Page 347 and 348: Dictionaries Next: Priority Queues
- Page 349 and 350: Dictionaries use a function that ma
- Page 351 and 352: Dictionaries Implementation-oriente
- Page 353 and 354: Priority Queues ● Besides access
- Page 355: Priority Queues Fibonacci heaps [FT
- Page 359 and 360: Suffix Trees and Arrays [GBY91]. Se
- Page 361 and 362: Graph Data Structures algorithms).
- Page 363 and 364: Graph Data Structures including the
- Page 365 and 366: Set Data Structures Next: Kd-Trees
- Page 367 and 368: Set Data Structures parent pointers
- Page 369 and 370: Kd-Trees Next: Numerical Problems U
- Page 371 and 372: Kd-Trees about p. Say we are lookin
- Page 373 and 374: Numerical Problems Next: Solving Li
- Page 375 and 376: Numerical Problems Mon Jun 2 23:33:
- Page 377 and 378: Solving Linear Equations algorithm
- Page 379 and 380: Solving Linear Equations Matrix inv
- Page 381 and 382: Bandwidth Reduction images near eac
- Page 383 and 384: Matrix Multiplication Next: Determi
- Page 385 and 386: Matrix Multiplication The linear al
- Page 387 and 388: Determinants and Permanents Next: C
- Page 389 and 390: Determinants and Permanents exposit
- Page 391 and 392: Constrained and Unconstrained Optim
- Page 393 and 394: Constrained and Unconstrained Optim
- Page 395 and 396: Linear Programming variable assignm
- Page 397 and 398: Linear Programming The book [MW93]
- Page 399 and 400: Random Number Generation Next: Fact
- Page 401 and 402: Random Number Generation is largely
- Page 403 and 404: Random Number Generation Related Pr
- Page 405 and 406: Factoring and Primality Testing The
Suffix Trees and Arrays<br />
Next: Graph Data Structures Up: Data Structures Previous: Priority Queues<br />
Suffix Trees and Arrays<br />
Input description: A reference string S.<br />
Problem description: Build a data structure for quickly finding all places where an arbitrary query string<br />
q is a substring of S.<br />
Discussion: Suffix trees and arrays are phenomenally useful data structures for solving string problems<br />
efficiently and with elegance. If you need to speed up a string processing algorithm from to linear<br />
time, proper use of suffix trees is quite likely the answer. Indeed, suffix trees are the hero of the war story<br />
reported in Section . In its simplest instantiation, a suffix tree is simply a trie of the n strings that<br />
are suffixes of an n-character string S. A trie is a tree structure, where each node represents one<br />
character, and the root represents the null string. Thus each path from the root represents a string,<br />
described by the characters labeling the nodes traversed. Any finite set of words defines a trie, and two<br />
words with common prefixes will branch off from each other at the first distinguishing character. Each<br />
leaf represents the end of a string. Figure illustrates a simple trie.<br />
file:///E|/BOOK/BOOK3/NODE131.HTM (1 of 4) [19/1/2003 1:30:07]