Revisions to Counting Occurrences of a Specific 3-Bit Pattern in a Byte Array

Add "interview" tip block to the end

Source Link

edited Aug 3, 2024 at 1:53

user272752

The above has an inner loop to include Brian Kernighan's well known "bit count" algorithm. In fact, with some extra tweaks, thisThis version is now amenable, with some tweaking, to using wider registers to combine several array bytes for simultaneous 'consolidation' and counting of instances of the target pattern. Note, however, that the target bit pattern is now expressed in bitwise operations making it slightly more difficult to discern.

Interview (and career) tip

Here's another version to ponder:

int occurs_101c( uint8_t arr[], size_t size ) {
    int cnt = 0;
    uint32_t b = 0; // bits register

    for( size_t i = 0; i < size; ++i ) {
        b = ((b << 8) | arr[i] ) & 0x3FF;
        cnt += ((b>>0 & 7) == 5) // 0000000101
            +  ((b>>1 & 7) == 5) // 0000001010
            +  ((b>>2 & 7) == 5) // 0000010100
            +  ((b>>3 & 7) == 5) // 0000101000
            +  ((b>>4 & 7) == 5) // 0001010000
            +  ((b>>5 & 7) == 5) // 0010100000
            +  ((b>>6 & 7) == 5) // 0101000000
            +  ((b>>7 & 7) == 5);// 1010000000
    }

    return cnt;
}

The above contains quite a few operations, but is "branchless" (apart from the required outer loop). Further, it can be easily "reasoned about" by anyone with a modicum of programming experience. In other words, it is simple to write, simple to read, and its correctness can be verified visually. Although not efficient or fast, its results will always be correct (for the problem as stated.) It's a good starting point! The worst thing a coder can produce is code that works most of the time.

Most of the answers, here, refer to the speed of using a LUT. One of those provides a function, containing some intricate but working code, that populates a LUT suitable for its implementation of the solution. The simple and verifiable function above could be called 1024 times (2¹⁰) with 1024 bit patterns by a caller that, likewise, populates an appropriate LUT (or outputs text suitable for copy/paste into a compile-time array.)

Consider the hypothetical that there is very limited time to write code to solve a problem. Now there are two priorities: Correctness (always), and a Deadline. If the interviewer got nothing better than the code above, the interviewee can highlight its simplicity and its correctness. Incorrect or incomplete solutions both score very low marks. If judged to be "inefficient. Do over.", the code above can be used to populate that efficient LUT with correct values. Or, as demonstrated by the offered "test harness", this simple version can be used to verify the results of a "more efficient" version that uses bit manipulation only.

Aside: Storing 1024 values that vary from 0 to 5, each in a byte, can be improved upon.
Exercise: cut that memory footprint in half by storing (and retrieving) 4bit values from an array of 512 bytes (each being a pair of "nibbles".) How does this affect performance? Can it be written "branchless"?

Recommend not walking away from this problem because it seems to be solved. It is by exploring alternatives that one adds-to or strengthens one's intuition and storehouse of "on-hand" solutions.

The above has an inner loop to include Brian Kernighan's well known "bit count" algorithm. In fact, with some extra tweaks, this version is now amenable, with some tweaking, to using wider registers to combine several array bytes for simultaneous 'consolidation' and counting of instances of the target pattern. Note, however, that the target bit pattern is now expressed in bitwise operations making it slightly more difficult to discern.

The above has an inner loop to include Brian Kernighan's well known "bit count" algorithm. This version is now amenable, with some tweaking, to using wider registers to combine several array bytes for simultaneous 'consolidation' and counting of instances of the target pattern. Note, however, that the target bit pattern is now expressed in bitwise operations making it slightly more difficult to discern.

Interview (and career) tip

Here's another version to ponder:

int occurs_101c( uint8_t arr[], size_t size ) {
    int cnt = 0;
    uint32_t b = 0; // bits register

    for( size_t i = 0; i < size; ++i ) {
        b = ((b << 8) | arr[i] ) & 0x3FF;
        cnt += ((b>>0 & 7) == 5) // 0000000101
            +  ((b>>1 & 7) == 5) // 0000001010
            +  ((b>>2 & 7) == 5) // 0000010100
            +  ((b>>3 & 7) == 5) // 0000101000
            +  ((b>>4 & 7) == 5) // 0001010000
            +  ((b>>5 & 7) == 5) // 0010100000
            +  ((b>>6 & 7) == 5) // 0101000000
            +  ((b>>7 & 7) == 5);// 1010000000
    }

    return cnt;
}

The above contains quite a few operations, but is "branchless" (apart from the required outer loop). Further, it can be easily "reasoned about" by anyone with a modicum of programming experience. In other words, it is simple to write, simple to read, and its correctness can be verified visually. Although not efficient or fast, its results will always be correct (for the problem as stated.) It's a good starting point! The worst thing a coder can produce is code that works most of the time.

Most of the answers, here, refer to the speed of using a LUT. One of those provides a function, containing some intricate but working code, that populates a LUT suitable for its implementation of the solution. The simple and verifiable function above could be called 1024 times (2¹⁰) with 1024 bit patterns by a caller that, likewise, populates an appropriate LUT (or outputs text suitable for copy/paste into a compile-time array.)

Consider the hypothetical that there is very limited time to write code to solve a problem. Now there are two priorities: Correctness (always), and a Deadline. If the interviewer got nothing better than the code above, the interviewee can highlight its simplicity and its correctness. Incorrect or incomplete solutions both score very low marks. If judged to be "inefficient. Do over.", the code above can be used to populate that efficient LUT with correct values. Or, as demonstrated by the offered "test harness", this simple version can be used to verify the results of a "more efficient" version that uses bit manipulation only.

Aside: Storing 1024 values that vary from 0 to 5, each in a byte, can be improved upon.
Exercise: cut that memory footprint in half by storing (and retrieving) 4bit values from an array of 512 bytes (each being a pair of "nibbles".) How does this affect performance? Can it be written "branchless"?

Recommend not walking away from this problem because it seems to be solved. It is by exploring alternatives that one adds-to or strengthens one's intuition and storehouse of "on-hand" solutions.

Add "Overboard" section...

Source Link

edited Jul 30, 2024 at 12:03

user272752

Overboard?

int occurs_101b( uint8_t arr[], size_t size ) {
    int cnt = 0;
    uint32_t b = 0, w; // bits register and working register

    for( size_t i = 0; i < size; ++i ) {
        b = ((b << 8) | arr[i]) & 0x3FF;
        w = ((((b >> 1) & ~b) >> 1) & b) & 0xFF; // 0b101
        for( ; w; cnt++ ) w &= w - 1; // Brian K's algo
    }

    return cnt;
}

The above has an inner loop to include Brian Kernighan's well known "bit count" algorithm. In fact, with some extra tweaks, this version is now amenable, with some tweaking, to using wider registers to combine several array bytes for simultaneous 'consolidation' and counting of instances of the target pattern. Note, however, that the target bit pattern is now expressed in bitwise operations making it slightly more difficult to discern.

Always have some fun with whatever you do.

Overboard?

int occurs_101b( uint8_t arr[], size_t size ) {
    int cnt = 0;
    uint32_t b = 0, w; // bits register and working register

    for( size_t i = 0; i < size; ++i ) {
        b = ((b << 8) | arr[i]) & 0x3FF;
        w = ((((b >> 1) & ~b) >> 1) & b) & 0xFF; // 0b101
        for( ; w; cnt++ ) w &= w - 1; // Brian K's algo
    }

    return cnt;
}

The above has an inner loop to include Brian Kernighan's well known "bit count" algorithm. In fact, with some extra tweaks, this version is now amenable, with some tweaking, to using wider registers to combine several array bytes for simultaneous 'consolidation' and counting of instances of the target pattern. Note, however, that the target bit pattern is now expressed in bitwise operations making it slightly more difficult to discern.

Always have some fun with whatever you do.

deleted 1 character in body

Source Link

edited Jul 30, 2024 at 11:14

user272752

It doesn't matter at which end of any 10bit sample register the tabulation begins, so the previous "mask" does not need to be manipulated (shifted). The above also has the possibility of "early determination of continguouscontiguous high-order clear bits" that would preclude some fruitless looping.

Add "Worthwhile" improved version...

Source Link

edited Jul 30, 2024 at 11:04

user272752

Loading

Tweak my function a bit to reduce line count.

Source Link

edited Jul 30, 2024 at 5:14

user272752

Loading

Source Link

answered Jul 30, 2024 at 5:00

user272752

Loading

Stack Exchange Network

Return to Answer

Interview (and career) tip

Interview (and career) tip

Overboard?

Overboard?