The.Algorithm.Design.Manual.Springer-Verlag.1998
The.Algorithm.Design.Manual.Springer-Verlag.1998 The.Algorithm.Design.Manual.Springer-Verlag.1998
Longest Common Substring , maintain the minimum y-coordinate of any path going through exactly k points. Inserting a new point will change exactly one of these paths by reducing the y-coordinate of the path whose last point is barely greater than the new point. ● What if the strings are permutations? - If the strings are permutations, then there are exactly n pairs of matching characters, and the above algorithm runs in time. A particularly important case of this occurs in finding the longest increasing subsequence of a sequence of numbers. Sorting this sequence and then replacing each number by its rank in the total order gives us a permutation p. The longest common subsequence of p and gives the longest increasing subsequence. ● What if we have more than two strings to align? - The basic dynamic programming algorithm can be generalized to k strings, taking time, where n is the length of the longest string. This algorithm is exponential in the number of strings k, and so it will likely be too expensive for more than 3 to 4 strings. Further, the problem is NP-complete, so no better exact algorithm is destined to come along. This problem of multiple sequence alignment has received considerable attention, and numerous heuristics have been proposed. Many heuristics begin by computing the pairwise alignment between each of the pairs of strings, and then work to merge these alignments. One approach is to build a graph with a vertex for each character of each string. There will be an edge between and if the corresponding characters are matched in the alignment between S and T. Any k- clique (see Section ) in this graph describes a commonly aligned character, and all such cliques can be found efficiently because of the sparse structure of this graph. Although these cliques will define a common subsequence, there is no reason to believe that it will be the longest such substring. Appropriately weakening the clique requirement provides a way to increase it, but still there can be no promises. Implementations: MAP (Multiple Alignment Program) [Hua94] by Xiaoqiu Huang is a C language program that computes a global multiple alignment of sequences using an iterative pairwise method. Certain parameters will need to be tweaked to make it accommodate non-DNA data. It is available by anonymous ftp from cs.mtu.edu in the pub/huang directory. Combinatorica [Ski90] provides a Mathematica implementation of an algorithm to construct the longest increasing subsequence of a permutation, which is a special case of longest common subsequence. This algorithm is based on Young tableaux rather than dynamic programming. See Section . Notes: Good expositions on longest common subsequence include [AHU83, CLR90]. A survey of algorithmic results appears in [GBY91]. The algorithm for the case where all the characters in each sequence are distinct or infrequent is due to Hunt and Szymanski [HS77]. Expositions of this algorithm file:///E|/BOOK/BOOK5/NODE208.HTM (3 of 4) [19/1/2003 1:32:18]
Longest Common Substring include [Aho90, Man89]. Multiple sequence alignment for computational biology is treated in [Wat95]. Certain problems on strings become easier when we assume a constant-sized alphabet. Masek and Paterson [MP80] solve longest common subsequence in for constant-sized alphabets, using the four Russians technique. Related Problems: Approximate string matching (see page ), shortest common superstring (see page ). Next: Shortest Common Superstring Up: Set and String Problems Previous: Finite State Machine Minimization Algorithms Mon Jun 2 23:33:50 EDT 1997 file:///E|/BOOK/BOOK5/NODE208.HTM (4 of 4) [19/1/2003 1:32:18]
- Page 599 and 600: Simplifying Polygons Next: Shape Si
- Page 601 and 602: Simplifying Polygons vertices and o
- Page 603 and 604: Shape Similarity Next: Motion Plann
- Page 605 and 606: Shape Similarity or how close it is
- Page 607 and 608: Motion Planning There is a wide ran
- Page 609 and 610: Motion Planning often arise in the
- Page 611 and 612: Maintaining Line Arrangements Think
- Page 613 and 614: Maintaining Line Arrangements Next:
- Page 615 and 616: Minkowski Sum where x+y is the vect
- Page 617 and 618: Set and String Problems Next: Set C
- Page 619 and 620: Set Cover Next: Set Packing Up: Set
- Page 621 and 622: Set Cover Figure: Hitting set is du
- Page 623 and 624: Set Packing Next: String Matching U
- Page 625 and 626: Set Packing Notes: An excellent exp
- Page 627 and 628: String Matching shouldn't try. Furt
- Page 629 and 630: String Matching and texts, I recomm
- Page 631 and 632: Approximate String Matching This sa
- Page 633 and 634: Approximate String Matching http://
- Page 635 and 636: Text Compression Next: Cryptography
- Page 637 and 638: Text Compression code string. ASCII
- Page 639 and 640: Cryptography Next: Finite State Mac
- Page 641 and 642: Cryptography ● How can I validate
- Page 643 and 644: Cryptography MD5 [Riv92] is the sec
- Page 645 and 646: Finite State Machine Minimization F
- Page 647 and 648: Finite State Machine Minimization S
- Page 649: Longest Common Substring than edit
- Page 653 and 654: Shortest Common Superstring Finding
- Page 655 and 656: Software systems Next: LEDA Up: Alg
- Page 657 and 658: LEDA Next: Netlib Up: Software syst
- Page 659 and 660: Netlib Algorithms Mon Jun 2 23:33:5
- Page 661 and 662: The Stanford GraphBase Next: Combin
- Page 663 and 664: Algorithm Animations with XTango Ne
- Page 665 and 666: Programs from Books Next: Discrete
- Page 667 and 668: Handbook of Data Structures and Alg
- Page 669 and 670: Algorithms from P to NP Next: Compu
- Page 671 and 672: Algorithms in C++ Next: Data Source
- Page 673 and 674: Textbooks Next: On-Line Resources U
- Page 675 and 676: On-Line Resources Next: Literature
- Page 677 and 678: People Next: Software Up: On-Line R
- Page 679 and 680: Professional Consulting Services Ne
- Page 681 and 682: Index A Up: Index - All Index: A ab
- Page 683 and 684: Index A artists steal ASA ASCII asp
- Page 685 and 686: Index B binary representation - sub
- Page 687 and 688: Index C Up: Index - All Index: C C+
- Page 689 and 690: Index C clustering , , co-NP coding
- Page 691 and 692: Index C consulting services , conta
- Page 693 and 694: Index D Up: Index - All Index: D DA
- Page 695 and 696: Index D Dictionaries dictionaries -
- Page 697 and 698: Index D dynamic programming - appli
- Page 699 and 700: Index E empirical results - heurist
Longest Common Substring<br />
, maintain the minimum y-coordinate of any path going through exactly k points.<br />
Inserting a new point will change exactly one of these paths by reducing the y-coordinate of the<br />
path whose last point is barely greater than the new point.<br />
● What if the strings are permutations? - If the strings are permutations, then there are exactly n<br />
pairs of matching characters, and the above algorithm runs in time. A particularly<br />
important case of this occurs in finding the longest increasing subsequence of a sequence of<br />
numbers. Sorting this sequence and then replacing each number by its rank in the total order gives<br />
us a permutation p. <strong>The</strong> longest common subsequence of p and gives the longest<br />
increasing subsequence.<br />
● What if we have more than two strings to align? - <strong>The</strong> basic dynamic programming algorithm can<br />
be generalized to k strings, taking time, where n is the length of the longest string. This<br />
algorithm is exponential in the number of strings k, and so it will likely be too expensive for more<br />
than 3 to 4 strings. Further, the problem is NP-complete, so no better exact algorithm is destined<br />
to come along.<br />
This problem of multiple sequence alignment has received considerable attention, and numerous<br />
heuristics have been proposed. Many heuristics begin by computing the pairwise alignment<br />
between each of the pairs of strings, and then work to merge these alignments. One approach is<br />
to build a graph with a vertex for each character of each string. <strong>The</strong>re will be an edge between<br />
and if the corresponding characters are matched in the alignment between S and T. Any k-<br />
clique (see Section ) in this graph describes a commonly aligned character, and all such cliques<br />
can be found efficiently because of the sparse structure of this graph.<br />
Although these cliques will define a common subsequence, there is no reason to believe that it<br />
will be the longest such substring. Appropriately weakening the clique requirement provides a<br />
way to increase it, but still there can be no promises.<br />
Implementations: MAP (Multiple Alignment Program) [Hua94] by Xiaoqiu Huang is a C language<br />
program that computes a global multiple alignment of sequences using an iterative pairwise method.<br />
Certain parameters will need to be tweaked to make it accommodate non-DNA data. It is available by<br />
anonymous ftp from cs.mtu.edu in the pub/huang directory.<br />
Combinatorica [Ski90] provides a Mathematica implementation of an algorithm to construct the longest<br />
increasing subsequence of a permutation, which is a special case of longest common subsequence. This<br />
algorithm is based on Young tableaux rather than dynamic programming. See Section .<br />
Notes: Good expositions on longest common subsequence include [AHU83, CLR90]. A survey of<br />
algorithmic results appears in [GBY91]. <strong>The</strong> algorithm for the case where all the characters in each<br />
sequence are distinct or infrequent is due to Hunt and Szymanski [HS77]. Expositions of this algorithm<br />
file:///E|/BOOK/BOOK5/NODE208.HTM (3 of 4) [19/1/2003 1:32:18]