Skip to main content
deleted 42 characters in body
Source Link
ikegami
  • 276
  • 1
  • 9

I think a precomputed 10-bit lookup table is the way to go. This reduces the number of operations to a minimum and is thus likely much faster. It might introduce caching issues. As always, only benchmarking in the appropriate environment will tell how it compares in practice.

unsigned count = 0;
uint16_t x = 0;
for ( size_t i=0; i<n; ++i ) {
   x = ( ( x & 0x3 ) << 8 ) | buf[ i ];
   count += count_lookup[ x ];
}

Note: the code is pure C, not C++.

I think a precomputed 10-bit lookup table is the way to go. This reduces the number of operations to a minimum and is thus likely much faster. It might introduce caching issues. As always, only benchmarking in the appropriate environment will tell how it compares in practice.

unsigned count = 0;
uint16_t x = 0;
for ( size_t i=0; i<n; ++i ) {
   x = ( ( x & 0x3 ) << 8 ) | buf[ i ];
   count += count_lookup[ x ];
}

Note: the code is pure C, not C++.

I think a precomputed 10-bit lookup table is the way to go. This reduces the number of operations to a minimum and is thus likely much faster. It might introduce caching issues. As always, only benchmarking in the appropriate environment will tell how it compares in practice.

unsigned count = 0;
uint16_t x = 0;
for ( size_t i=0; i<n; ++i ) {
   x = ( ( x & 0x3 ) << 8 ) | buf[ i ];
   count += count_lookup[ x ];
}
Reformat to better suit the site
Source Link
Peilonrayz
  • 44.6k
  • 7
  • 80
  • 158

I think a precomputed 10-bit lookup table beis the way to go. This reduces the number of operations to a minimum and is thus likely much faster. It might introduce caching issues. As always, only benchmarking in the appropriate environment will tell how it compares in practice.

unsigned count = 0;
uint16_t x = 0;
for ( size_t i=0; i<n; ++i ) {
   x = ( ( x & 0x3 ) << 8 ) | buf[ i ];
   count += count_lookup[ x ];
}

This reducesNote: the number of operations to a minimum andcode is thus likely much faster, but it might introduce caching issues. As alwayspure C, only benchmarking in the appropriate environment will tell how it compares in practice.

(I can't guarantee that it's validnot C++, but it is valid C.)

I think a precomputed 10-bit lookup table be the way to go.

unsigned count = 0;
uint16_t x = 0;
for ( size_t i=0; i<n; ++i ) {
   x = ( ( x & 0x3 ) << 8 ) | buf[ i ];
   count += count_lookup[ x ];
}

This reduces the number of operations to a minimum and is thus likely much faster, but it might introduce caching issues. As always, only benchmarking in the appropriate environment will tell how it compares in practice.

(I can't guarantee that it's valid C++, but it is valid C.)

I think a precomputed 10-bit lookup table is the way to go. This reduces the number of operations to a minimum and is thus likely much faster. It might introduce caching issues. As always, only benchmarking in the appropriate environment will tell how it compares in practice.

unsigned count = 0;
uint16_t x = 0;
for ( size_t i=0; i<n; ++i ) {
   x = ( ( x & 0x3 ) << 8 ) | buf[ i ];
   count += count_lookup[ x ];
}

Note: the code is pure C, not C++.

added 23 characters in body
Source Link
ikegami
  • 276
  • 1
  • 9

WouldI think a precomputed 10-bit lookup table be the way to go?.

unsigned count = 0;
uint16_t x = 0;
for ( size_t i=0; i<n; ++i ) {
   x = ( ( x & 0x3 ) << 8 ) | buf[ i ];
   count += count_lookup[ x ];
}

(This This reduces the number of operations to a minimum and is Cthus likely much faster, but you getit might introduce caching issues. As always, only benchmarking in the ideaappropriate environment will tell how it compares in practice.

(I can't guarantee that it's valid C++, but it is valid C.)

Would a precomputed 10-bit lookup table be the way to go?

unsigned count = 0;
uint16_t x = 0;
for ( size_t i=0; i<n; ++i ) {
   x = ( ( x & 0x3 ) << 8 ) | buf[ i ];
   count += count_lookup[ x ];
}

(This is C, but you get the idea.)

I think a precomputed 10-bit lookup table be the way to go.

unsigned count = 0;
uint16_t x = 0;
for ( size_t i=0; i<n; ++i ) {
   x = ( ( x & 0x3 ) << 8 ) | buf[ i ];
   count += count_lookup[ x ];
}

This reduces the number of operations to a minimum and is thus likely much faster, but it might introduce caching issues. As always, only benchmarking in the appropriate environment will tell how it compares in practice.

(I can't guarantee that it's valid C++, but it is valid C.)

Source Link
ikegami
  • 276
  • 1
  • 9
Loading