Skip to main content
minor typo and formatting fixes
Source Link
Edward
  • 67.2k
  • 4
  • 120
  • 284

In answer to your second question: How about, instead of your historyBuffer.find() method returning the first position of a match, you return an ArrayList of Triplets which match? This is because if you find a match, you know you will preformperform another iteration of the loop, (provided ii is not at its maximum value which is unlikely). Next time you perform the iteration, instead of going through your entire sliding window looking for a match, simply check whether or not any of the phrases in your ArrayListArrayList of TripletsTriplets will still match when the string ss has that additional character appended. This is because a match longer than the current match must build upon either that current match, or some other equally long match. This way, you don't redo work that has already been done. This approach means you can get rid of the lines

One final note, however, is that incorporating my suggestion, where you look for multiple matches rather than one match, will influence which string searching algorithm would be optimal to implement in your historyBuffer.find()historyBuffer.find() method. The Rabin-Karp substring matching algorithm, is generally best for finding multiple matches. This algorithm uses hashing to discard parts of the historyBufferhistoryBuffer which will definitely not match the substring, leaving you to easily check the parts of the historyBufferhistoryBuffer which are likely to match. However, if you are simply finding one match, as your current implementation does, then the Boyer-Moore algorithm is your best choice.

In answer to your second question: How about, instead of your historyBuffer.find() method returning the first position of a match, you return an ArrayList of Triplets which match? This is because if you find a match, you know you will preform another iteration of the loop, (provided i is not at its maximum value which is unlikely). Next time you perform the iteration, instead of going through your entire sliding window looking for a match, simply check whether or not any of the phrases in your ArrayList of Triplets will still match when the string s has that additional character appended. This is because a match longer than the current match must build upon either that current match, or some other equally long match. This way, you don't redo work that has already been done. This approach means you can get rid of the lines

One final note, however, is that incorporating my suggestion, where you look for multiple matches rather than one match, will influence which string searching algorithm would be optimal to implement in your historyBuffer.find() method. The Rabin-Karp substring matching algorithm, is generally best for finding multiple matches. This algorithm uses hashing to discard parts of the historyBuffer which will definitely not match the substring, leaving you to easily check the parts of the historyBuffer which are likely to match. However, if you are simply finding one match, as your current implementation does, then the Boyer-Moore algorithm is your best choice.

In answer to your second question: How about, instead of your historyBuffer.find() method returning the first position of a match, you return an ArrayList of Triplets which match? This is because if you find a match, you know you will perform another iteration of the loop, (provided i is not at its maximum value which is unlikely). Next time you perform the iteration, instead of going through your entire sliding window looking for a match, simply check whether or not any of the phrases in your ArrayList of Triplets will still match when the string s has that additional character appended. This is because a match longer than the current match must build upon either that current match, or some other equally long match. This way, you don't redo work that has already been done. This approach means you can get rid of the lines

One final note, however, is that incorporating my suggestion, where you look for multiple matches rather than one match, will influence which string searching algorithm would be optimal to implement in your historyBuffer.find() method. The Rabin-Karp substring matching algorithm, is generally best for finding multiple matches. This algorithm uses hashing to discard parts of the historyBuffer which will definitely not match the substring, leaving you to easily check the parts of the historyBuffer which are likely to match. However, if you are simply finding one match, as your current implementation does, then the Boyer-Moore algorithm is your best choice.

Source Link

In answer to your first question, it may depend on whether you wish to optimise for speed or compression ratio. If optimising for speed, it would seem that using bytes is best, as that is what the LZ4 algorithm does. LZ4 is a variant of LZ77, highly optimised for speed. If your optimisation is for the compression ratio, I am unsure which would be better, as I have never run a bitwise LZ77 compressor.

In answer to your second question: How about, instead of your historyBuffer.find() method returning the first position of a match, you return an ArrayList of Triplets which match? This is because if you find a match, you know you will preform another iteration of the loop, (provided i is not at its maximum value which is unlikely). Next time you perform the iteration, instead of going through your entire sliding window looking for a match, simply check whether or not any of the phrases in your ArrayList of Triplets will still match when the string s has that additional character appended. This is because a match longer than the current match must build upon either that current match, or some other equally long match. This way, you don't redo work that has already been done. This approach means you can get rid of the lines

if ((historyBuffer.compare(histCurrLen - i, i, s) == 0) && (lookheadBuffer[0] == lookheadBuffer[i]))
    pos = histCurrLen - i;

// If the longest match is found, check if there are any repeats
// following the of current longest substring in lookheadBuffer
int extra = 0;
if (histCurrLen == pos + i)
{
    // Check for full following repeats
    while ((lookCurrLen >= i + extra + i) && (lookheadBuffer.compare(i + extra, i, s) == 0))
        extra += i;

    // Check for partial following repeats
    int extraextra = i - 1;
    while (extraextra > 0)
    {
        if ((lookCurrLen >= i + extra + extraextra) && (lookheadBuffer.compare(i + extra, extraextra, s, 0, extraextra) == 0))
            break;
        extraextra--;
    }

    extra += extraextra;
}

without losing any performance.

With regards to your question about the complexity of searching left-to-right or vice versa, the time complexity of the two will be identical.

One final note, however, is that incorporating my suggestion, where you look for multiple matches rather than one match, will influence which string searching algorithm would be optimal to implement in your historyBuffer.find() method. The Rabin-Karp substring matching algorithm, is generally best for finding multiple matches. This algorithm uses hashing to discard parts of the historyBuffer which will definitely not match the substring, leaving you to easily check the parts of the historyBuffer which are likely to match. However, if you are simply finding one match, as your current implementation does, then the Boyer-Moore algorithm is your best choice.