The above has an inner loop to include Brian Kernighan's well known "bit count" algorithm. In fact, with some extra tweaks, thisThis version is now amenable, with some tweaking, to using wider registers to combine several array bytes for simultaneous 'consolidation' and counting of instances of the target pattern. Note, however, that the target bit pattern is now expressed in bitwise operations making it slightly more difficult to discern.
Interview (and career) tip
Here's another version to ponder:
int occurs_101c( uint8_t arr[], size_t size ) {
int cnt = 0;
uint32_t b = 0; // bits register
for( size_t i = 0; i < size; ++i ) {
b = ((b << 8) | arr[i] ) & 0x3FF;
cnt += ((b>>0 & 7) == 5) // 0000000101
+ ((b>>1 & 7) == 5) // 0000001010
+ ((b>>2 & 7) == 5) // 0000010100
+ ((b>>3 & 7) == 5) // 0000101000
+ ((b>>4 & 7) == 5) // 0001010000
+ ((b>>5 & 7) == 5) // 0010100000
+ ((b>>6 & 7) == 5) // 0101000000
+ ((b>>7 & 7) == 5);// 1010000000
}
return cnt;
}
The above contains quite a few operations, but is "branchless" (apart from the required outer loop). Further, it can be easily "reasoned about" by anyone with a modicum of programming experience. In other words, it is simple to write, simple to read, and its correctness can be verified visually. Although not efficient or fast, its results will always be correct (for the problem as stated.) It's a good starting point! The worst thing a coder can produce is code that works most of the time.
Most of the answers, here, refer to the speed of using a LUT. One of those provides a function, containing some intricate but working code, that populates a LUT suitable for its implementation of the solution. The simple and verifiable function above could be called 1024 times (210) with 1024 bit patterns by a caller that, likewise, populates an appropriate LUT (or outputs text suitable for copy/paste into a compile-time array.)
Consider the hypothetical that there is very limited time to write code to solve a problem. Now there are two priorities: Correctness (always), and a Deadline. If the interviewer got nothing better than the code above, the interviewee can highlight its simplicity and its correctness. Incorrect or incomplete solutions both score very low marks. If judged to be "inefficient. Do over.", the code above can be used to populate that efficient LUT with correct values. Or, as demonstrated by the offered "test harness", this simple version can be used to verify the results of a "more efficient" version that uses bit manipulation only.
Aside: Storing 1024 values that vary from 0 to 5, each in a byte, can be improved upon.
Exercise: cut that memory footprint in half by storing (and retrieving) 4bit values from an array of 512 bytes (each being a pair of "nibbles".) How does this affect performance? Can it be written "branchless"?
Recommend not walking away from this problem because it seems to be solved. It is by exploring alternatives that one adds-to or strengthens one's intuition and storehouse of "on-hand" solutions.