Skip to content

commit-reach: add generalized find_reachable()#2142

Draft
spkrka wants to merge 3 commits into
gitgitgadget:masterfrom
spkrka:krka/reachability-wins
Draft

commit-reach: add generalized find_reachable()#2142
spkrka wants to merge 3 commits into
gitgitgadget:masterfrom
spkrka:krka/reachability-wins

Conversation

@spkrka

@spkrka spkrka commented Jun 8, 2026

Copy link
Copy Markdown

In 2018, Stolee consolidated commit walks into commit-reach.c and
extracted can_all_from_reach_with_flag() from upload-pack's
ok_to_give_up() with the observation that we can reuse its
commit walking logic for many other callers (ba3ca1e).
In 4fbcca4 it also got optimized with a memoized DFS so
subsequent from-commits benefit from shared ancestry
(very cool optimization!).

This series continues that idea by generalizing the algorithm into
find_reachable() and rolling it out to the remaining callers. Most
conversions are just code reuse with preserved performance. The big
win is ref-filter branch --contains, where batching N per-ref DFS
walks into a single call with shared RESULT memoization gives
14.5x on gitgitgadget/git.

This makes can_all_from_reach(), contains_tag_algo and its
infrastructure redundant — all deleted. The contains_cache commit
slab is replaced by temporary flag bits on commit->object.flags.

Patch breakdown:

  1. commit-reach: add find_reachable() and convert simple callers
    Add the new batch reachability primitive and convert
    repo_is_descendant_of, repo_in_merge_bases_many, and test-reach.

  2. commit-reach: convert can_all_from_reach_with_flag to find_reachable_core
    Replace the inline DFS in can_all_from_reach_with_flag() with a
    delegation to find_reachable_core(). Delete can_all_from_reach().

  3. ref-filter: batch --contains/--no-contains using find_reachable
    Replace per-ref commit_contains() with batched find_reachable_list().
    Delete contains_tag_algo and all supporting infrastructure.

Benchmarks on gitgitgadget/git (v2.48, ~85k commits, 370 branches,
730 tags), median of 5-10 sequential runs on a quiet machine:

  branch -r --contains v2.30.0:   13.49s -> 928ms  (14.5x faster)
  branch -r --contains v2.47.0:    6.19s -> 1.05s  ( 5.9x faster)
  tag --contains v2.30.0:          1.27s -> 1.32s  (neutral)
  tag --contains v2.47.0:          1.40s -> 1.41s  (neutral)
  merge-base --is-ancestor:        682ms -> 678ms  (neutral)

The branch --contains speedup comes from the O(N*D)->O(D+N) batch
change. tag --contains is neutral because the old contains_tag_algo
already had per-commit slab caching. merge-base --is-ancestor is
neutral since the bottleneck is commit-graph object loading, not
the walk pattern.

@spkrka spkrka force-pushed the krka/reachability-wins branch 3 times, most recently from 41fe555 to c64ca4f Compare June 8, 2026 18:41
@spkrka spkrka changed the title commit-reach: add general graph traversal find_reachable() Jun 8, 2026
@spkrka spkrka force-pushed the krka/reachability-wins branch from c64ca4f to 23ecfbd Compare June 8, 2026 18:52
spkrka added 3 commits June 8, 2026 20:54
Add find_reachable(), a batch reachability primitive that checks
which commits from a 'from' set can reach any commit in a 'to' set.
It uses the same memoized DFS approach as can_all_from_reach_with_flag()
(introduced by Stolee in ba3ca1e, optimized in 4fbcca4).

Convert repo_is_descendant_of and repo_in_merge_bases_many to use
the new function when generation numbers are available, and update
test-reach to exercise the new code paths.

Signed-off-by: Kristofer Karlsson <krka@spotify.com>
…core

Replace the inline DFS loop in can_all_from_reach_with_flag() with a
call to find_reachable_core(), which implements the same memoized DFS
algorithm. Delete can_all_from_reach() which is no longer called.

Signed-off-by: Kristofer Karlsson <krka@spotify.com>
Replace the per-ref commit_contains() calls with a single batched
find_reachable_list() call for --contains and --no-contains filtering.
This changes the time complexity from O(N * D) to O(D + N) where D is
the reachable graph depth and N is the number of refs.

Delete the now-unused contains_tag_algo(), commit_contains() and all
supporting infrastructure (contains_cache slab, contains_result enum,
contains_stack, with_commit_tag_algo flag).

Signed-off-by: Kristofer Karlsson <krka@spotify.com>
@spkrka spkrka force-pushed the krka/reachability-wins branch from 23ecfbd to 8358c22 Compare June 8, 2026 18:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant