Problem definition: Lintcode.
Given two strings $A = [a_{0}, a_{1}, …, a_{n}]$ and $B = [b_{0}, b_{1}, …, b_{n}]$, find the longest common substring and return its length.
Define the structure of an optimal solution
An optimal solution is represented by the longest common suffix for all pairs of prefixes of the strings. Defining
- $A_{i} = [a_{0}, a_{1}, …, a_{i}]$ a prefix of $A$;
- $B_{j} = [b_{0}, b_{1}, …, b_{j}]$ a prefix of $B$;
- $LCSuffix(A_{i},B_{j})$ a function retrieving the longest common suffix for the given prefixes;
the longest common substring of A and B is $\max_{0 \leq i \leq m, 0 \leq j \leq n}LCSuffix(A_{i},B_{j})$.
Recursively define the value of an optimal solution
The longest common suffix can be recursively computed according to the following formula:
$LCSuffix(A_{i},B_{j})$ = \begin{cases} LCSuffix(A_{i+1},B_{j+1}) + 1, & \text{if $a_{i} = b_{j}$} \\ 0, & \text{if $a_{i} \neq b_{j}$} \end{cases}
Since the longest common suffix for a couple of prefixes ($A_{i}, B_{j}$) contains within the longest common suffix for longer prefixes $A_{i+1}, B_{j+1}$, the problem exhibits optimal substructure. Moreover the longest common substring computation has also overlapping subproblems, because the longest common suffix shall be computes for all pairs of prefixes. So a dynamic programming approach can be applied.
Compute the value of an optimal solution
The following c++ recursive program implements a top-down with memoization solution:
The following c++ program implements an iterative bottom-up approach:
Reconstructing an optimal solution
A longest common substring can be simply reconstructed by storing the index of the last of the substring in one of the two input string and using the longest common substring length to retrieve the index of the first character of the substring.
Comments