Skip to content

commit-reach: terminate merge-base walk when one side is exhausted#2149

Open
spkrka wants to merge 8 commits into
gitgitgadget:masterfrom
spkrka:side-exhaust-pr
Open

commit-reach: terminate merge-base walk when one side is exhausted#2149
spkrka wants to merge 8 commits into
gitgitgadget:masterfrom
spkrka:side-exhaust-pr

Conversation

@spkrka

@spkrka spkrka commented Jun 13, 2026

Copy link
Copy Markdown

commit-reach: terminate merge-base walk when one paint side is exhausted

Optimize paint_down_to_common() for merge-base queries that hit
large one-sided histories.

When the walk from one side reaches a commit with a very low
generation number that the other side never paints, the walk is
forced to drain most of the graph. A common trigger is a
repository import that grafts a separate history with its own root,
but any merge that introduces a low-generation commit never painted
by the other side has the same effect.

A new merge-base candidate can only be discovered when exclusive
PARENT1 and PARENT2 paint meet. This series teaches
paint_down_to_common() to stop as soon as one side has no exclusive
commits left in the queue; once one side is exhausted, no further
candidates can appear.

  origin/HEAD  o   o  PR HEAD
               |   |
     (import)  o   :
              / \ /
             |   o  merge-base
             |   |
             :   :  (~2.5M commits)
             |   |
  import root   main root

In the RFC thread [1], Derrick Stolee provided a criss-cross
counterexample that sharpened the halt condition, and Elijah Newren
independently discovered the same optimization and shared an
implementation in PR #2150 [2]. Patches 2-3 incorporate test
cases from Elijah's branch.

This series implements the optimization only after the walk enters
the finite-generation region, where generation ordering guarantees
that paint on visited commits is final.

Patch layout:

1/8 Documentation/technical: add paint-down-to-common doc
2/8 t6600: add test cases for side-exhaustion edge cases
3/8 t6099, t6600: add side-exhaustion regression tests
4/8 commit-reach: add trace2 instrumentation to paint_down_to_common()
5/8 commit-reach: introduce struct paint_state with per-side counters
6/8 commit-reach: remove unused nonstale_queue dedup wrappers
7/8 commit-reach: terminate merge-base walk when one paint side is exhausted
8/8 commit-reach: move min_generation check into paint_queue_get()

Benchmarks

Step counts are deterministic (measured via trace2_data_intmax added
in patch 4). Wall-clock times are best-of-11 runs.

2.6M-commit monorepo with commit-graph:

                                        steps              wall-clock
  merge-base --all  (across import)  2143438 ->      3     3.67s ->    5ms
  merge-base --all  (1000 apart)     2692915 ->   1035     4.41s ->    7ms
  merge-base --all  (5000 apart)     2692915 ->   6401     4.45s ->   13ms
  merge-base --all  (HEAD vs import) 2698872 ->  45960     4.50s ->   79ms
  merge-tree        (across import)  2143438 ->      3     4.42s ->   11ms

git.git (88k commits, commit-graph):

                                        steps              wall-clock
  merge-base --all v2.0.0 v2.55.0-rc1 72264 ->  44589      110ms ->   68ms
  merge-base --all HEAD HEAD~1000      9891 ->   3828       18ms ->   10ms
  merge-base --all HEAD HEAD~10000    72303 ->  41487      101ms ->   50ms

Changes since v3:

  • Fixed BUG assertion that was accidentally made unconditional
    in v3: restored the min_generation guard so it only fires
    when generation-based ordering is active.

  • Moved generation cutoff and single-result termination
    conditions into the documentation in patch 1/8, since they
    describe existing behavior.

  • Renamed paint_state counter fields for clarity: p1_count ->
    parent1_count, p2_count -> parent2_count, pending_merge_bases
    -> mb_candidate_count. Changed counter types from int to
    size_t. (Suggested by Rene Scharfe.)

Changes since v2:

  • New patch 8/8: moved the min_generation termination check and
    the last_gen monotonicity assertion into paint_queue_get(),
    consolidating halt conditions. commit_graph_generation() is
    now called once per dequeued commit and shared across all
    checks.

  • Moved all halt conditions inside paint_queue_get() with the
    "pop first" form: pop, check, then
    decrement counters. This keeps the optimization commit's
    diff minimal (just inserting the new checks between pop and
    decrement).

  • Shortened the doc comment on paint_queue_get() to describe
    what it does rather than how. Inline comments on each
    return NULL explain the specific halt condition.

  • Replaced the manual commit-graph setup in the step-count test
    with run_all_modes, which now sets GIT_TRACE2_EVENT per mode
    and produces trace-mode-{none,full,half,no-gdat}.txt files.

  • Added a test_paint_down_steps helper for concise 4-mode step
    assertions with diagnostic output on mismatch (prints
    "expected X, got Y" instead of a silent grep failure).

  • Added step-count assertions to the single-walk edge-case
    tests: in_merge_bases_many:self, pending-stale,
    infinity-both-sides, mixed-finite-infinity.

  • Included step counts alongside wall-clock times in the
    benchmark tables.

Changes since v1:

  • Reordered patches: documentation first (describing the existing
    algorithm), tests before code changes, so they demonstrate
    passing with old logic first.

  • Dropped the ahead_behind decoupling patch. paint_state is now a
    NEW struct alongside nonstale_queue instead of replacing it.
    ahead_behind() is completely untouched.

  • Removed nonstale_queue_put_dedup() and nonstale_queue_get_dedup()
    (dead code after the conversion) in a separate commit.

  • Renamed: struct paint_queue -> paint_state, field pq -> queue,
    paint_count_add/remove -> paint_count_update (single function
    with signed delta parameter).

  • Split the old paint_count_transition (which handled both old
    and new flags in one call) into separate remove/add calls with
    a signed delta. This eliminates the need for the case 0
    handler (which tracked "not in the queue") and allows an
    exhaustive switch on (PARENT1 | PARENT2 | STALE) that
    documents all valid flag combinations, with BUG() in default.

  • Added trace2_data_intmax() instrumentation to report the number
    of commits visited per paint walk (separate commit), with
    deterministic step-count assertions in t6600.

  • Expanded switch statements to multi-line format per .clang-format.

  • Used !count style throughout instead of count == 0.

  • Updated technical documentation alongside code changes.

[1] https://lore.kernel.org/git/CAL71e4Ps-2_0+uuZu43N9pFnXBemoAohPs_eyRJf8taXHJPAXQ@mail.gmail.com/T/#u
[2] #2150

CC: Derrick Stolee stolee@gmail.com
CC: Elijah Newren newren@gmail.com

@spkrka spkrka force-pushed the side-exhaust-pr branch 10 times, most recently from 7d5b1bb to 3e1315e Compare June 20, 2026 08:55
@spkrka spkrka changed the title commit-reach: terminate merge-base walk when one paint side is exhausted Jun 20, 2026
@spkrka spkrka force-pushed the side-exhaust-pr branch from 3e1315e to 9cbfc67 Compare June 20, 2026 09:09
@spkrka

spkrka commented Jun 20, 2026

Copy link
Copy Markdown
Author

/preview

@gitgitgadget

gitgitgadget Bot commented Jun 20, 2026

Copy link
Copy Markdown
@spkrka

spkrka commented Jun 20, 2026

Copy link
Copy Markdown
Author

/submit

@gitgitgadget

gitgitgadget Bot commented Jun 20, 2026

Copy link
Copy Markdown

Submitted as pull.2149.git.1781951820.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-2149/spkrka/side-exhaust-pr-v1

To fetch this version to local tag pr-2149/spkrka/side-exhaust-pr-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-2149/spkrka/side-exhaust-pr-v1
@spkrka spkrka marked this pull request as ready for review June 22, 2026 11:32
@gitgitgadget

gitgitgadget Bot commented Jun 23, 2026

Copy link
Copy Markdown

This patch series was integrated into seen via git@418052d.

@gitgitgadget gitgitgadget Bot added the seen label Jun 23, 2026
@spkrka spkrka closed this Jun 24, 2026
@spkrka spkrka deleted the side-exhaust-pr branch June 24, 2026 09:20
@spkrka spkrka restored the side-exhaust-pr branch June 24, 2026 09:25
@spkrka spkrka reopened this Jun 24, 2026
@spkrka spkrka force-pushed the side-exhaust-pr branch from 9cbfc67 to d84b932 Compare June 24, 2026 09:26
@spkrka

spkrka commented Jun 24, 2026

Copy link
Copy Markdown
Author

/submit

@gitgitgadget

gitgitgadget Bot commented Jun 24, 2026

Copy link
Copy Markdown

Submitted as pull.2149.v2.git.1782303254.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-2149/spkrka/side-exhaust-pr-v2

To fetch this version to local tag pr-2149/spkrka/side-exhaust-pr-v2:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-2149/spkrka/side-exhaust-pr-v2
@gitgitgadget

gitgitgadget Bot commented Jun 25, 2026

Copy link
Copy Markdown

This branch is now known as kk/merge-base-exhaustion.

@spkrka spkrka force-pushed the side-exhaust-pr branch 3 times, most recently from f574f35 to 4b9f192 Compare June 26, 2026 12:54
@spkrka

spkrka commented Jun 26, 2026

Copy link
Copy Markdown
Author

/submit

@gitgitgadget

gitgitgadget Bot commented Jun 26, 2026

Copy link
Copy Markdown

Submitted as pull.2149.v3.git.1782479286.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-2149/spkrka/side-exhaust-pr-v3

To fetch this version to local tag pr-2149/spkrka/side-exhaust-pr-v3:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-2149/spkrka/side-exhaust-pr-v3
@spkrka spkrka force-pushed the side-exhaust-pr branch from 4b9f192 to 255cddd Compare June 26, 2026 14:42
spkrka and others added 4 commits June 26, 2026 19:16
Add a technical document describing the paint_down_to_common()
algorithm used for merge-base computation, covering the paint
walk, generation number regions, and termination conditions.

Signed-off-by: Kristofer Karlsson <krka@spotify.com>
Add test cases to t6600-test-reach.sh that exercise edge cases in the
side-exhaustion optimization for paint_down_to_common():

 - in_merge_bases_many:self: commit is both A and one of the X inputs
 - get_merge_bases_many:duplicate-twos: duplicate entries in X list
 - get_merge_bases_many:pending-stale: STALE transition on an
   already-painted commit (ps-* diamond topology)
 - get_merge_bases_many:infinity-both-sides: both tips outside the
   commit-graph with non-monotonic dates (pi-* topology)

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
Add t6099 to test the case where multiple merge-base candidates exist
and one is an ancestor of another. This exercises the side-exhaustion
optimization in paint_down_to_common together with the
remove_redundant safety net in get_merge_bases_many_0.

Add a mixed finite/INFINITY test to t6600 where one tip is outside
the commit-graph (INFINITY generation) and the other is inside.
This exercises the region transition: the walk starts in the
INFINITY region where side-exhaustion is disabled, then crosses
into the finite region where it can fire.

Signed-off-by: Kristofer Karlsson <krka@spotify.com>
Add a step counter and trace2_data_intmax() call so that the number
of commits visited during the paint walk is observable via
GIT_TRACE2_EVENT. This provides a way to measure the impact of
future optimizations without relying on wall-clock benchmarks alone.

Signed-off-by: Kristofer Karlsson <krka@spotify.com>
@spkrka spkrka force-pushed the side-exhaust-pr branch from 255cddd to d30b6ec Compare June 26, 2026 17:30
spkrka added 4 commits June 27, 2026 10:54
Add a paint_state struct for use by paint_down_to_common() that
wraps a prio_queue with per-side commit counters. Each non-stale
queued commit occupies exactly one counter bucket based on its
paint flags: PARENT1-only, PARENT2-only, or both sides (a pending
merge-base candidate).

The counters are maintained by paint_count_update() which adjusts
the appropriate bucket by a signed delta. An exhaustive switch on
the paint+stale bits documents all valid flag combinations in one
place.

Convert paint_down_to_common() to use paint_state. The loop now
drains the queue via paint_queue_get() which returns NULL when all
counters reach zero, replacing the old pointer-based termination
(max_nonstale). This is equivalent behavior -- both conditions
detect that no non-stale entries remain.

paint_queue_get() uses a "pop first" form: it dequeues a commit,
then checks the counters. This means the loop exits one iteration
earlier than the old code in some topologies (the popped stale
commit is never processed), so a few step counts drop by one.

The existing nonstale_queue is left in place for ahead_behind().

Signed-off-by: Kristofer Karlsson <krka@spotify.com>
nonstale_queue_put_dedup() and nonstale_queue_get_dedup() became
unused after the previous commit. The core nonstale_queue functions
remain in use by ahead_behind().

Signed-off-by: Kristofer Karlsson <krka@spotify.com>
Add an early termination check to paint_down_to_common() using the
per-side counters introduced earlier.  Once the walk enters the
finite-generation region, terminate early when one side's exclusive
count drops to zero -- no new merge-base can form without both paint
sides meeting.

The check also waits for pending_merge_bases to reach zero, ensuring
all merge-base candidates have been dequeued and recorded before
exiting.

The INFINITY gate ensures correctness: commits without a commit-graph
entry have GENERATION_NUMBER_INFINITY and are ordered by commit date,
which is not topologically reliable.  The optimization only fires
once the walk enters the finite-generation region where ordering
guarantees hold.

Step counts measured with trace2 on git.git with commit-graph:

  merge-base --all v2.0.0 v2.55.0-rc1:
    before: 72264 steps    after: 44589 steps

  merge-base --all v2.55.0-rc1 v2.55.0-rc1~5:
    before:   110 steps    after:     7 steps

Helped-by: Derrick Stolee <stolee@gmail.com>
Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
Consolidate the min_generation termination condition into
paint_queue_get(), alongside the existing stale-entry and
side-exhaustion checks.

Move last_gen into struct paint_state so that
commit_graph_generation() is called exactly once per dequeued commit
and the result is shared across all termination checks and the
monotonicity BUG assertion.

Signed-off-by: Kristofer Karlsson <krka@spotify.com>
@spkrka spkrka force-pushed the side-exhaust-pr branch from d30b6ec to 8dd15d4 Compare June 27, 2026 09:18
@spkrka

spkrka commented Jun 28, 2026

Copy link
Copy Markdown
Author

/submit

@gitgitgadget

gitgitgadget Bot commented Jun 28, 2026

Copy link
Copy Markdown

Submitted as pull.2149.v4.git.1782649547.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-2149/spkrka/side-exhaust-pr-v4

To fetch this version to local tag pr-2149/spkrka/side-exhaust-pr-v4:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-2149/spkrka/side-exhaust-pr-v4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2 participants