Skip to content

commit-reach: add generalized find_reachable()#2142

Draft
spkrka wants to merge 3 commits into
gitgitgadget:masterfrom
spkrka:krka/reachability-wins
Draft

commit-reach: add generalized find_reachable()#2142
spkrka wants to merge 3 commits into
gitgitgadget:masterfrom
spkrka:krka/reachability-wins

Conversation

@spkrka

@spkrka spkrka commented Jun 8, 2026

Copy link
Copy Markdown

In 2018, Stolee consolidated commit walks into commit-reach.c and
extracted can_all_from_reach_with_flag() from upload-pack's
ok_to_give_up() with the observation that we can reuse its
commit walking logic for many other callers (ba3ca1e).
In 4fbcca4 it also got optimized with a memoized DFS so
subsequent from-commits benefit from shared ancestry
(very cool optimization!).

This series continues that idea by generalizing the algorithm into
find_reachable() and rolling it out to the remaining callers. Most
conversions are just code reuse with preserved performance. The big
win is ref-filter branch --contains, where batching N per-ref DFS
walks into a single call with shared RESULT memoization gives
14.5x on gitgitgadget/git.

This makes can_all_from_reach(), contains_tag_algo and its
infrastructure redundant — all deleted. The contains_cache commit
slab is replaced by temporary flag bits on commit->object.flags.

Patch breakdown:

  1. commit-reach: add find_reachable() and convert simple callers
    Add the new batch reachability primitive and convert
    repo_is_descendant_of, repo_in_merge_bases_many, and test-reach.

  2. commit-reach: convert can_all_from_reach_with_flag to find_reachable_core
    Replace the inline DFS in can_all_from_reach_with_flag() with a
    delegation to find_reachable_core(). Delete can_all_from_reach().

  3. ref-filter: batch --contains/--no-contains using find_reachable
    Replace per-ref commit_contains() with batched find_reachable_list().
    Delete contains_tag_algo and all supporting infrastructure.

Benchmarks on gitgitgadget/git (v2.48, ~85k commits, 370 branches,
730 tags), median of 5-10 sequential runs on a quiet machine:

  branch -r --contains v2.30.0:   13.49s -> 928ms  (14.5x faster)
  branch -r --contains v2.47.0:    6.19s -> 1.05s  ( 5.9x faster)
  tag --contains v2.30.0:          1.27s -> 1.32s  (neutral)
  tag --contains v2.47.0:          1.40s -> 1.41s  (neutral)
  merge-base --is-ancestor:        682ms -> 678ms  (neutral)

The branch --contains speedup comes from the O(N*D)->O(D+N) batch
change. tag --contains is neutral because the old contains_tag_algo
already had per-commit slab caching. merge-base --is-ancestor is
neutral since the bottleneck is commit-graph object loading, not
the walk pattern.

@spkrka spkrka force-pushed the krka/reachability-wins branch 3 times, most recently from 41fe555 to c64ca4f Compare June 8, 2026 18:41
@spkrka spkrka changed the title commit-reach: add general graph traversal find_reachable() commit-reach: add generalized find_reachable() Jun 8, 2026
@spkrka spkrka force-pushed the krka/reachability-wins branch from c64ca4f to 23ecfbd Compare June 8, 2026 18:52
spkrka added 3 commits June 8, 2026 20:54
Add find_reachable(), a batch reachability primitive that checks
which commits from a 'from' set can reach any commit in a 'to' set.
It uses the same memoized DFS approach as can_all_from_reach_with_flag()
(introduced by Stolee in ba3ca1e, optimized in 4fbcca4).

Convert repo_is_descendant_of and repo_in_merge_bases_many to use
the new function when generation numbers are available, and update
test-reach to exercise the new code paths.

Signed-off-by: Kristofer Karlsson <krka@spotify.com>
…core

Replace the inline DFS loop in can_all_from_reach_with_flag() with a
call to find_reachable_core(), which implements the same memoized DFS
algorithm. Delete can_all_from_reach() which is no longer called.

Signed-off-by: Kristofer Karlsson <krka@spotify.com>
Replace the per-ref commit_contains() calls with a single batched
find_reachable_list() call for --contains and --no-contains filtering.
This changes the time complexity from O(N * D) to O(D + N) where D is
the reachable graph depth and N is the number of refs.

Delete the now-unused contains_tag_algo(), commit_contains() and all
supporting infrastructure (contains_cache slab, contains_result enum,
contains_stack, with_commit_tag_algo flag).

Signed-off-by: Kristofer Karlsson <krka@spotify.com>
@spkrka spkrka force-pushed the krka/reachability-wins branch from 23ecfbd to 8358c22 Compare June 8, 2026 18:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant