Skip to content

Commit 067ebcd

Browse files
Alfredo Altamiranometa-codesync[bot]
authored andcommitted
ARM: branchless binary search for smallPrefixes_
Summary: Extract a branchless upper_bound into a function and use it on ARM for the smallPrefixes_ lookup. Conditional index updates compile to CSEL instructions, avoiding branch mispredictions that hurt sorted_vector_map::upper_bound() which goes through iterator/comparator wrappers. On x86, the hardware branch predictor handles upper_bound's branches well, so this optimization is gated with `#ifdef __aarch64__`. Common code (fullPrefixes_ upper_bound and previousPrefix_ walk) is shared between both paths. Benchmark command: ``` buck run fbcode/mode/opt fbcode//mcrouter/lib/fbi/cpp/test/facebook:string_prefix_map_benchmark -- --bm_regex _Lb --bm_mode=adaptive --bm_min_secs=5 --bm_max_secs=30 ``` # ARM Before: ``` ============================================================================ [...]facebook/StringPrefixMapBenchmark.cpp relative time/iter iters/s ============================================================================ PrefixMapInCache_Lb 197.37ns 5.07M PrefixMapNotInCache_Lb 1.83us 547.10K ``` After: ``` ============================================================================ [...]facebook/StringPrefixMapBenchmark.cpp relative time/iter iters/s ============================================================================ PrefixMapInCache_Lb 92.27ns 10.84M PrefixMapNotInCache_Lb 449.13ns 2.23M ``` - PrefixMapInCache_Lb: 197.37ns → 92.27ns (53% less time/iter) - PrefixMapNotInCache_Lb: 1.83us → 449.13ns (75% less time/iter) Reviewed By: DenisYaroshevskiy Differential Revision: D97136185 fbshipit-source-id: 2c06d420378264b595a03efab56358857bc30fcc
1 parent 8ae299b commit 067ebcd

1 file changed

Lines changed: 34 additions & 1 deletion

File tree

‎mcrouter/lib/fbi/cpp/LowerBoundPrefixMap.cpp‎

Lines changed: 34 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,12 +13,37 @@
1313
namespace facebook::memcache::detail {
1414
namespace {
1515

16-
// .starts_with is not avaliable before C++20 and this needs to compile with an
16+
// .starts_with is not available before C++20 and this needs to compile with an
1717
// older standard
1818
bool std_string_view_starts_with(std::string_view what, std::string_view with) {
1919
return what.substr(0, with.size()) == with;
2020
}
2121

22+
#ifdef __aarch64__
23+
// Branchless upper_bound on ARM. Conditional index updates compile to CSEL
24+
// instructions, avoiding branch mispredictions that hurt
25+
// sorted_vector_map::upper_bound() through iterator/comparator wrappers.
26+
// On x86, the hardware branch predictor handles upper_bound well so this
27+
// is not beneficial there.
28+
template <typename I, typename T, typename Compare>
29+
I branchlessUpperBound(I f, I l, const T& x, Compare comp) {
30+
auto len = static_cast<std::size_t>(l - f);
31+
if (len == 0) {
32+
return f;
33+
}
34+
35+
while (len > 1) {
36+
auto half = len / 2;
37+
f += comp(x, f[half]) ? 0 : half;
38+
len -= half;
39+
}
40+
f += comp(x, *f) ? 0 : 1;
41+
42+
return f;
43+
}
44+
45+
#endif
46+
2247
} // namespace
2348

2449
std::ostream& operator<<(std::ostream& os, const SmallPrefix& self) {
@@ -65,8 +90,16 @@ LowerBoundPrefixMapCommon::LowerBoundPrefixMapCommon(
6590

6691
std::uint32_t LowerBoundPrefixMapCommon::findPrefix(
6792
std::string_view query) const noexcept {
93+
#ifdef __aarch64__
94+
auto afterPrefix = branchlessUpperBound(
95+
smallPrefixes_.begin(),
96+
smallPrefixes_.end(),
97+
SmallPrefix{query},
98+
[](const SmallPrefix& v, const auto& elem) { return v < elem.first; });
99+
#else
68100
// Due to a sentinel - guaranteed to not be .begin()
69101
auto afterPrefix = smallPrefixes_.upper_bound(SmallPrefix{query});
102+
#endif
70103
auto [roughFrom, roughTo] = std::prev(afterPrefix)->second;
71104

72105
// Binary search complete strings between rough boundaries.

0 commit comments

Comments
 (0)