I have modified a Z-Algorithm for string similarity in order to match strings which differ in 0 or 1 char.
However, it seems much slower than original Z-Algorithm although from my point of view there is only a minor different and it also has a O(n) time complexity.
Input is a string in which a pattern is going to be found. So the goal is to find parts in string where pattern is present. The goal is also to find parts in string where the pattern is 'almost present'. So peep, leek and peek are all 'present' in string = "blahpeekblahpeepblahleek" for pattern = "peek". The output are indices in which matches are found. So for my example the output is "4 12 20".
modified Z-Algorithm:
static String stringSimilarity(String string, String pattern ) {
String s = pattern + "$" + string;
int len = s.length();
int patternLen = pattern .length();
int[] z = new int[len];
int l, r, i;
StringBuilder result = new StringBuilder();
l = r = patternLen;
i = patternLen + 1;
while (i < len) {
//O(1)
if (i <= r)
z[i] = Math.min(z[i - l], r - i + 1);
while (i + z[i] < len && z[i] < patternLen && s.charAt(z[i]) == s.charAt(i + z[i]))
z[i] += 1;
// if the previous while encountered mismatch we accept the mismatch as match and seek for matches once more
if (i + z[i] < len && z[i] < patternLen && s.charAt(z[i]) != s.charAt(i + z[i])) {
z[i] += 1;
while (i + z[i] < len && z[i] < patternLen && s.charAt(z[i]) == s.charAt(i + z[i]))
z[i] += 1;
}
if (z[i] >= patternLen)
result.append(i - patternLen - 1).append(" ");
if (i + z[i] - 1 > r) {
l = i;
r = i + z[i] - 1;
}
i += 1;
}
return result.toString();
}
Z-Algorithm body (code is not entirely mine but I adopted it while mine was not that readable):
while (i < len) { //O(n + n) = O(n)
//O(1)
if (i <= r)
z[i] = Math.min(z[i - l], r - i + 1);
//loops only n times at maximum - O(n)
while (i + z[i] < len && s.charAt(z[i]) == s.charAt(i + z[i]))
z[i] += 1;
if (z[i] >= patternLen)
result.append(i - patternLen- 1).append(" ");
//O(1)
if (i + z[i] - 1 > r) {
l = i;
r = i + z[i] - 1;
}
i += 1; //O(1)
}
Can you please check the complexity of my modified Z-Algorithm?
From my point of view it is the same, however, tests on Hackerank show timeout.
res.toString()which should beresult.toString()\$\endgroup\$peekmatchespeekbut also matchespeeporleekorpeck... \$\endgroup\$Stringrather than (as might be expected) aboolean. \$\endgroup\$