Skip to main content
added 3243 characters in body
Source Link
IAbstract
  • 616
  • 1
  • 4
  • 15

Why my own custom iterator interfaces instead of implementing IEnumerable<T>
I felt this should be addressed in depth -- added as a matter of interest
The problem is iterating two sequences (seqA, a.k.a. Source and seqB, a.k.a. Mask)

  • The Mask must be satisfied in whole (byte for byte in sequence)
  • Reset Mask stack when matching fails on a sequence
  • Increment the Mask stack as Source sequences match
  • Stop matches if Source stack is at the end

This boils down to needing 2 state machines to handle the logic of iterating an inner set of conditions (a.k.a. maskTokens) for evaluation. A maskToken is then used to match against a specified Source sequence - which is iterated by the outer state machine.

I used interfaces to build two rudimentary state-machines implementing IEnumerator<T>. For the record, I am weighing the implementation of IEnumerable<T>; at the time of design I was not convinced of how intuitive the interface would be when considering the assumptions made while using an IEnumerable<T>.

The inner state machine - MaskIterator - doesn't really have any special logic but exposes some additional members:

//   `ControlChar` is a custom value type representing a byte[4]
interface IMaskSequenceIterator : IEnumerator<ControlChar> {

    #region properties

    ControlChar Current { get; }

    int Index { get; }

    IMaskToken MaskToken { get; }

    #endregion


    #region members

    void Dispose();

    bool IsFirst();

    bool IsLast();

    long Length();

    void MoveFirst();

    bool MoveNext();

    void Reset();

    #endregion

}

The outer state machine - MatchIterator - has quite a bit more logic:

A. Manipulate and Interpret MaskIterator properties
B. Find Match Sequence: logic black box

//   IMetaToken holds [start, length] info of sequence matches
interface IMatchSequenceIterator : IEnumerator<IMetaToken> {

    #region properties

    IMetaToken Current { get; }

    bool EndOfSequence { get; set; }

    bool IsSequenceMatch { get; }

    IMaskSequenceIterator MaskIterator { get; }

    #endregion


    #region members

    void Dispose();

    bool MoveNext();

    void Reset();

    #endregion
}

Basically, I am taking advantage of the state-machine characteristics of the IEnumerator (as a pattern, so to speak). The idea of this project is that I can build a mask, or sequence of bytes - say, 91-00-00-00; 16-00-00-00; 58-00-00-00 - and match a source byte stream looking for every occurrence of 91-00-00-00-16-00-00-00-58-00-00-00.

The mask formatter can use wildcards specifying a length so I can format something like: 91-00-00-00 42-42 42-42-16-00 58-00-00-00

... where 42 is *. The MatchIterator will return meta data for anything that matches the pattern 91-00-00-00-42-42-42-42-16-00-58-00-00-00 and 42 can be replaced by any byte.

Feedback is welcome!

Why my own custom iterator interfaces instead of implementing IEnumerable<T>
I felt this should be addressed in depth -- added as a matter of interest
The problem is iterating two sequences (seqA, a.k.a. Source and seqB, a.k.a. Mask)

  • The Mask must be satisfied in whole (byte for byte in sequence)
  • Reset Mask stack when matching fails on a sequence
  • Increment the Mask stack as Source sequences match
  • Stop matches if Source stack is at the end

This boils down to needing 2 state machines to handle the logic of iterating an inner set of conditions (a.k.a. maskTokens) for evaluation. A maskToken is then used to match against a specified Source sequence - which is iterated by the outer state machine.

I used interfaces to build two rudimentary state-machines implementing IEnumerator<T>. For the record, I am weighing the implementation of IEnumerable<T>; at the time of design I was not convinced of how intuitive the interface would be when considering the assumptions made while using an IEnumerable<T>.

The inner state machine - MaskIterator - doesn't really have any special logic but exposes some additional members:

//   `ControlChar` is a custom value type representing a byte[4]
interface IMaskSequenceIterator : IEnumerator<ControlChar> {

    #region properties

    ControlChar Current { get; }

    int Index { get; }

    IMaskToken MaskToken { get; }

    #endregion


    #region members

    void Dispose();

    bool IsFirst();

    bool IsLast();

    long Length();

    void MoveFirst();

    bool MoveNext();

    void Reset();

    #endregion

}

The outer state machine - MatchIterator - has quite a bit more logic:

A. Manipulate and Interpret MaskIterator properties
B. Find Match Sequence: logic black box

//   IMetaToken holds [start, length] info of sequence matches
interface IMatchSequenceIterator : IEnumerator<IMetaToken> {

    #region properties

    IMetaToken Current { get; }

    bool EndOfSequence { get; set; }

    bool IsSequenceMatch { get; }

    IMaskSequenceIterator MaskIterator { get; }

    #endregion


    #region members

    void Dispose();

    bool MoveNext();

    void Reset();

    #endregion
}

Basically, I am taking advantage of the state-machine characteristics of the IEnumerator (as a pattern, so to speak). The idea of this project is that I can build a mask, or sequence of bytes - say, 91-00-00-00; 16-00-00-00; 58-00-00-00 - and match a source byte stream looking for every occurrence of 91-00-00-00-16-00-00-00-58-00-00-00.

The mask formatter can use wildcards specifying a length so I can format something like: 91-00-00-00 42-42 42-42-16-00 58-00-00-00

... where 42 is *. The MatchIterator will return meta data for anything that matches the pattern 91-00-00-00-42-42-42-42-16-00-58-00-00-00 and 42 can be replaced by any byte.

Feedback is welcome!

added 33 characters in body
Source Link
IAbstract
  • 616
  • 1
  • 4
  • 15
        IMatchSequenceIterator matchIterator = new MatchSequenceIterator(this, maskToken, position);
        var sumLength = 0L;
        var isComplete = false;

        // Match
        while ((!isComplete) &&
               (hasMatch = matchIterator.MoveNext())) {
            // get the sum length
            sumLength += matchIterator.Current.Length;
            
            // MaskIterator is constructed by the MatchIterator
            if (matchIterator.MaskIterator.IsFirst()) {
                position = matchIterator.Current.Start;
            }

            // debug points
            var index = matchIterator.MaskIterator.Index;
            var isLast = matchIterator.MaskIterator.IsLast();

            isComplete = isLast || matchIterator.EndOfSequence;

        }

        //  this comparison tells us if we matched all sequences
        if (isComplete && matchIterator.MaskIterator.IsLast()) {
            return new MetaToken(position, sumLength);
        }

        // Matches
        while ((!isComplete) &&
               (hasMatch = matchIterator.MoveNext())) {
            sumLength += matchIterator.Current.Length;

            if (matchIterator.MaskIterator.IsFirst()) {
                position = matchIterator.Current.Start;
            }

            var index = matchIterator.MaskIterator.Index;
            var isLast = matchIterator.MaskIterator.IsLast();

            isComplete = matchIterator.EndOfSequence;

            if (!isComplete && isLast) {
                //  yield the current match meta data
                yield return new MetaToken(position, sumLength);
                //  reset counters;
                position += sumLength;
                sumLength = 0;
                //  we want to keep finding matches so reset the internal mask iterator
                matchIterator.MaskIterator.Reset();
                //  reset has match flag
                hasMatch = false;
            }

        }
        IMatchSequenceIterator matchIterator = new MatchSequenceIterator(this, maskToken, position);
        var sumLength = 0L;
        var isComplete = false;

        // Match
        while ((!isComplete) &&
               (hasMatch = matchIterator.MoveNext())) {
            // get the sum length
            sumLength += matchIterator.Current.Length;
            
            // MaskIterator is constructed by the MatchIterator
            if (matchIterator.MaskIterator.IsFirst()) {
                position = matchIterator.Current.Start;
            }

            var index = matchIterator.MaskIterator.Index;
            var isLast = matchIterator.MaskIterator.IsLast();

            isComplete = isLast || matchIterator.EndOfSequence;

        }

        //  this comparison tells us if we matched all sequences
        if (isComplete && matchIterator.MaskIterator.IsLast()) {
            return new MetaToken(position, sumLength);
        }

        // Matches
        while ((!isComplete) &&
               (hasMatch = matchIterator.MoveNext())) {
            sumLength += matchIterator.Current.Length;

            if (matchIterator.MaskIterator.IsFirst()) {
                position = matchIterator.Current.Start;
            }

            var index = matchIterator.MaskIterator.Index;
            var isLast = matchIterator.MaskIterator.IsLast();

            isComplete = matchIterator.EndOfSequence;

            if (!isComplete && isLast) {
                //  yield the current match meta data
                yield return new MetaToken(position, sumLength);
                //  reset counters;
                position += sumLength;
                sumLength = 0;
                //  we want to keep finding matches so reset the internal mask iterator
                matchIterator.MaskIterator.Reset();
                //  reset has match flag
                hasMatch = false;
            }

        }
        IMatchSequenceIterator matchIterator = new MatchSequenceIterator(this, maskToken, position);
        var sumLength = 0L;
        var isComplete = false;

        // Match
        while ((!isComplete) &&
               (hasMatch = matchIterator.MoveNext())) {
            // get the sum length
            sumLength += matchIterator.Current.Length;
            
            // MaskIterator is constructed by the MatchIterator
            if (matchIterator.MaskIterator.IsFirst()) {
                position = matchIterator.Current.Start;
            }

            // debug points
            var index = matchIterator.MaskIterator.Index;
            var isLast = matchIterator.MaskIterator.IsLast();

            isComplete = isLast || matchIterator.EndOfSequence;

        }

        //  this comparison tells us if we matched all sequences
        if (isComplete && matchIterator.MaskIterator.IsLast()) {
            return new MetaToken(position, sumLength);
        }

        // Matches
        while ((!isComplete) &&
               (hasMatch = matchIterator.MoveNext())) {
            sumLength += matchIterator.Current.Length;

            if (matchIterator.MaskIterator.IsFirst()) {
                position = matchIterator.Current.Start;
            }

            var index = matchIterator.MaskIterator.Index;
            var isLast = matchIterator.MaskIterator.IsLast();

            isComplete = matchIterator.EndOfSequence;

            if (!isComplete && isLast) {
                //  yield the current match meta data
                yield return new MetaToken(position, sumLength);
                //  reset counters;
                position += sumLength;
                sumLength = 0;
                //  we want to keep finding matches so reset the internal mask iterator
                matchIterator.MaskIterator.Reset();
                //  reset has match flag
                hasMatch = false;
            }

        }
Source Link
IAbstract
  • 616
  • 1
  • 4
  • 15

@svick is right in suggesting separate classes/objects. These objects come in the form of custom iterator objects. There are two stacks that must be iterated:

  1. a collection of mask tokens (each token really nothing more than byte[])
  2. full sequence matches (i.e. a start/length of match)

For the mask tokens:

private struct MaskSequenceIterator : IMaskSequenceIterator

For the matches:

private struct MatchSequenceIterator : IMatchSequenceIterator

IMatchSequenceIterator does all the real work and quite easily. The magic is all in the MoveNext() method:

        public bool MoveNext() {
            // gets the next mask token
            var continueMatching = mMaskIterator.MoveNext();
            // determine if there is enough stream to keep matching
            if (continueMatching) {
                continueMatching &= mCurrentPosition + mMaskToken[mMaskIterator.Index].Length <= mRegex.mByteStream.Length;
            }

            // check to see if the sequence matches - nvm the out var atm
            var sequenceMatches = TryGetNextMatch(continueMatching, out mCurrent);
            
            // set iterator properties
            mIsSequenceMatch = sequenceMatches;
            mEndOfStream = mCurrentPosition >= mRegex.mByteStream.Length;

            // true if matches
            return sequenceMatches;
        }

This greatly simplified my outer logic in both a single Match method as well as the multiple Matches method:

        IMatchSequenceIterator matchIterator = new MatchSequenceIterator(this, maskToken, position);
        var sumLength = 0L;
        var isComplete = false;

        // Match
        while ((!isComplete) &&
               (hasMatch = matchIterator.MoveNext())) {
            // get the sum length
            sumLength += matchIterator.Current.Length;
            
            // MaskIterator is constructed by the MatchIterator
            if (matchIterator.MaskIterator.IsFirst()) {
                position = matchIterator.Current.Start;
            }

            var index = matchIterator.MaskIterator.Index;
            var isLast = matchIterator.MaskIterator.IsLast();

            isComplete = isLast || matchIterator.EndOfSequence;

        }

        //  this comparison tells us if we matched all sequences
        if (isComplete && matchIterator.MaskIterator.IsLast()) {
            return new MetaToken(position, sumLength);
        }

        // Matches
        while ((!isComplete) &&
               (hasMatch = matchIterator.MoveNext())) {
            sumLength += matchIterator.Current.Length;

            if (matchIterator.MaskIterator.IsFirst()) {
                position = matchIterator.Current.Start;
            }

            var index = matchIterator.MaskIterator.Index;
            var isLast = matchIterator.MaskIterator.IsLast();

            isComplete = matchIterator.EndOfSequence;

            if (!isComplete && isLast) {
                //  yield the current match meta data
                yield return new MetaToken(position, sumLength);
                //  reset counters;
                position += sumLength;
                sumLength = 0;
                //  we want to keep finding matches so reset the internal mask iterator
                matchIterator.MaskIterator.Reset();
                //  reset has match flag
                hasMatch = false;
            }

        }