Revision 236105 of "AxelBoldt/Test" on enwiki

The '''longest-common subsquence problem''' is the search for an efficient method of finding the longest common [[subsequence]] (LCS).  This [[computer science]] problem has gained prominence thanks in part to the [[bioinformatics]] field.

An old method of searching for LCS was to employ a [[brute force]] policy:
Given a sequence '''X''', determine all possible of subsequences of '''X''', and check to see if each subsequence was a subsequence of '''Y''', keeping track of the longest subsequence found.  Each subsequence of '''X''' would be in the set of {1,2,3,4,....,k}.  Using [[number theory]] proofs, we find that there would be 2<sup>k</sup> subsequences of '''X'''.  This would be in [[Big O notation|exponential time]], making this search extremely ineffective for long sequences, such as [[human]] [[DNA]] strands.

== Four Steps to LCS, Polynomial Time Edition ==

1. ''Analyze LCS properties''<BR>
Many [[computer scientist]]s have written papers on LCS properties, including one where LCS has an [[optimal-substructure]] property.

The '''''Optimal-Substructure of an LCS Theorem''''' is

Let '''X''' = < x<sub>1</sub>,...,x<sub>m</sub> > and '''Y''' = < y<sub>1</sub>,...,y<sub>n</sub> > be sequences, and let '''Z''' = < z<sub>1</sub>,...,z<sub>k</sub> > be any LCS of '''X''' and '''Y'''. 


# If x<sub>m</sub> = y<sub>n</sub>, then z<sub>k</sub> = x<sub>m</sub> = y<sub>n</sub> and Z<sub>k-1</sub> is an LCS of X<sub>m-1</sub> and Y<sub>n-1</sub>. 
# If x<sub>m</sub> &#8800; y<sub>n</sub>, then z<sub>k</sub> &#8800; x<sub>m</sub>, implies that Z is an LCS of X<sub>m-1</sub> and Y. 
# If x<sub>m</sub> &#8800; y<sub>n</sub>, then z<sub>k</sub> &#8800; y<sub>n</sub> implies that Z is an LCS of X and Y<sub>n-1</sub>. 


2. ''Devise a [[recursive]] solution''<BR>
[[Computer scientist]]s have agreed on the formula:
:<math> c[i,j] = \left\{\begin{matrix} 0, & \mbox{if i = 0 or j = 0}  \\ c[i-1,j-1]+1, & \mbox{if i,j}>\mbox{0 and }x_i = y_j  \\ max(c[i,j-1],c[i-1,j]) & \mbox{if i,j}>\mbox{0 and }x_i \ne y_j \end{matrix}\right. </math>

The recursive formula relies on the Optimal Substructure Theorem in Part 1.  It involves establishing a recurrence for the optimal value.  ''c[i,j]'' is the length of the LCS in sequences X<sub>i</sub> and Y<sub>j</sub>.  The first part of the formula says if either one of the sequences has length 0, then there can be no proper subsequence of a [[null]] sequence.  The other two parts breaks the LCS apart into smaller subproblems until we reach the null sequence.

3. ''Compute the LCS''
* There are several algorithms available to computer scientists; several involve exponential time functions, others are in polynomial time.  One such polynomial function is the ''Generic LCS Delta'' algorithm.

 LCS-Delta(X,Y)
 m <- LENGTH[X];
 n <- LENGTH[Y];
 for i <- 1 to m, do c[i,0] <-0;
 for j <- 0 to n, do c[0,j] <-0;
 b <- c;
 for i <- 1 to m, do {
     for j <- 1 to n do {
          if (x<sub>i</sub> = y<sub>j</sub>) {
              c[i,j] <- c[i-1,j-1]+1;
              b[i,j] <- "UP-LEFT";
          }
          else if (c[i-1,j] >= c[i,j-1]) {
              c[i,j] <- c[i-1,j];
              b[i,j] <- "UP";
          }
          else
              c[i,j] <- c[i,j-1];
              b[i,j] <- "LEFT";
          }
     }
  return c and b


* LCS Delta is an algorithm that takes two sequences '''X''' and '''Y''' as inputs.  The algorithm creates a table ''c'' and with the lengths of X and Y.  The algorithm also has a table ''b'', which is a copy of ''c'', which is used to store the optimal solution of the LCS.  The rest of the algorithm uses the formula in Step 2 to compute the LCS, and populate the table.  The algorithm runs in O(mn) time, where m and n are the length of '''X''' and '''Y''' respectively.  There are more efficient algorithms, namely LCS Alpha and LCS Beta, but this algorithm is the most intuitive.

4. ''Construct the LCS''

* The b-table returned by the algorithm is used to construct the LCS.  The b-table could look something like this:

:<math>\begin{bmatrix} ! & 0 & 0 & 0 & 0 & 0\\  
 0 & UP-LEFT & LEFT & 0 & 0 & 0 \\  
 0 & 0 & 0 & UP-LEFT & 0  & 0\\  
 0 & 0 & 0 & 0 & UP-LEFT & 0 \\  
 0 & 0 & 0 & 0 & UP & LEFT \\
\end{bmatrix}
</math>

Starting from the (m,n) position, we follow the direction of the arrows, until we reach the end of the LCS, denoted by !. The values in c-table corresponding to the "UP-LEFT" positions in the b-table would be an element of LCS.  The LCS is constructed in reverse order using this method.  The construction method takes O(m + n) time to complete, therefore it runs in linear time.

Modern LCS search methods have yielded algorithms which have cut down the exponential time to polynomial time.  The LCS continues to be examined by [[computer scientist]]s, trying to find an even faster time, perhaps one in logarithmic/linear time.

[[Category:Algorithms on strings]]
[[Category:Combinatorics]]
[[Category:Formal languages]]
All content in the above text box is licensed under the Creative Commons Attribution-ShareAlike license Version 4 and was originally sourced from https://en.wikipedia.org/w/index.php?oldid=236105.
This site is not affiliated with or endorsed in any way by the Wikimedia Foundation or any of its affiliates. In fact, we fucking despise them.