Before starting with algorithms, let us understand difference between a substring of a string and a susbsequence of a string. Substring and subsequence are confusing terms.
Example:
Input string : "HELLO WELCOME EVERYONE"
Sub-string : "EVERY" or "COME"
Subsequence : "LWCME"
Example:
String 1 : "HELLO WELCOME EVERYONE"
String 2 : "NO TWEET NO NO WELEVERYONEWELHELL"
LCString : "EVERYONE"
One approach of solving this would be using Dynamic Programming.
a. Allocate a 2D array of size [str1.Length, str2.Length]
b. Each value of this matrix saves the information about length of longest substring that's ending at that position. e.g. L[3,4] gives length of longest common substring of first three characters of first input string and first four characters of second input string.
c. If we start with 1st character of string 1 and 1st character of string 2, if they match we can say L[0, 0] = 1
d. We then can build this array by seeing if LCS value at any postion is greater than cached LCS value.
e. We will also save the ending index of the longest common substring till every point when character do not match. This can be used to backtrack and get the actual LCS from given input string.
f. L[str1.Length, str2.Length] will have the final LCS length saved in it.
g. Each step uses LCS value calculated in previous steps and adds one to it if characters match at new indexes in both given strings.
In below C# code I'm assuming array of characters are passed as input instead of string.
NOTE: I hope comments in the above code helps in understanding the code block.
I will continue this article and explain about finding Longest Common Subsequence in next article.
Algorithm and References: wikipedia (link provided above)
Longest Common substring
Longest Common substring of two given strings is the string which is common substring of both given strings and longest of all such common substrings.Example:
String 1 : "HELLO WELCOME EVERYONE"
String 2 : "NO TWEET NO NO WELEVERYONEWELHELL"
LCString : "EVERYONE"
One approach of solving this would be using Dynamic Programming.
a. Allocate a 2D array of size [str1.Length, str2.Length]
b. Each value of this matrix saves the information about length of longest substring that's ending at that position. e.g. L[3,4] gives length of longest common substring of first three characters of first input string and first four characters of second input string.
c. If we start with 1st character of string 1 and 1st character of string 2, if they match we can say L[0, 0] = 1
d. We then can build this array by seeing if LCS value at any postion is greater than cached LCS value.
e. We will also save the ending index of the longest common substring till every point when character do not match. This can be used to backtrack and get the actual LCS from given input string.
f. L[str1.Length, str2.Length] will have the final LCS length saved in it.
g. Each step uses LCS value calculated in previous steps and adds one to it if characters match at new indexes in both given strings.
In below C# code I'm assuming array of characters are passed as input instead of string.
public
static void
PrintLongestCommonSubstring(char[] str1, char[] str2)
{
int[,]
l = new int[str1.Length,
str2.Length];
int
lcs = -1;
string
substr = string.Empty;
int
end = -1;
for
(int i = 0; i < str1.Length; i++)
{
for
(int j = 0; j < str2.Length; j++)
{
//if
character at 'i' in string 1 and character at 'j' in string 2
matches then increment
the length of lcs value for that position
if
(str1[i] == str2[j])
{
//taking care of array indexes
if (i == 0 || j == 0)
{
l[i, j] = 1;
}
else
{
//characters match , lets update new lcs value
l[i, j] = l[i -
1, j - 1] + 1;
}
//This is to
get the longest common substring
// use end index in the first string for longest common
substring
if (l[i, j] > lcs)
{
lcs = l[i, j];
end = i;
}
}
else
{
//characters do not match use set length of LCS at that
position
as 0
l[i, j] = 0;
}
}
}
for
(int i = end - lcs + 1; i <= end; i++)
{
substr += str1[i];
}
Console.WriteLine("Longest Common SubString Length = {0}, Longest
Common
Substring = {1}", lcs, substr);
}
|
NOTE: I hope comments in the above code helps in understanding the code block.
I will continue this article and explain about finding Longest Common Subsequence in next article.
Algorithm and References: wikipedia (link provided above)
No comments:
Post a Comment