commit-reach: add generalized find_reachable()#2142
Draft
spkrka wants to merge 3 commits into
Draft
Conversation
41fe555 to
c64ca4f
Compare
c64ca4f to
23ecfbd
Compare
Add find_reachable(), a batch reachability primitive that checks which commits from a 'from' set can reach any commit in a 'to' set. It uses the same memoized DFS approach as can_all_from_reach_with_flag() (introduced by Stolee in ba3ca1e, optimized in 4fbcca4). Convert repo_is_descendant_of and repo_in_merge_bases_many to use the new function when generation numbers are available, and update test-reach to exercise the new code paths. Signed-off-by: Kristofer Karlsson <krka@spotify.com>
…core Replace the inline DFS loop in can_all_from_reach_with_flag() with a call to find_reachable_core(), which implements the same memoized DFS algorithm. Delete can_all_from_reach() which is no longer called. Signed-off-by: Kristofer Karlsson <krka@spotify.com>
Replace the per-ref commit_contains() calls with a single batched find_reachable_list() call for --contains and --no-contains filtering. This changes the time complexity from O(N * D) to O(D + N) where D is the reachable graph depth and N is the number of refs. Delete the now-unused contains_tag_algo(), commit_contains() and all supporting infrastructure (contains_cache slab, contains_result enum, contains_stack, with_commit_tag_algo flag). Signed-off-by: Kristofer Karlsson <krka@spotify.com>
23ecfbd to
8358c22
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In 2018, Stolee consolidated commit walks into commit-reach.c and
extracted can_all_from_reach_with_flag() from upload-pack's
ok_to_give_up() with the observation that we can reuse its
commit walking logic for many other callers (ba3ca1e).
In 4fbcca4 it also got optimized with a memoized DFS so
subsequent from-commits benefit from shared ancestry
(very cool optimization!).
This series continues that idea by generalizing the algorithm into
find_reachable() and rolling it out to the remaining callers. Most
conversions are just code reuse with preserved performance. The big
win is ref-filter branch --contains, where batching N per-ref DFS
walks into a single call with shared RESULT memoization gives
14.5x on gitgitgadget/git.
This makes can_all_from_reach(), contains_tag_algo and its
infrastructure redundant — all deleted. The contains_cache commit
slab is replaced by temporary flag bits on commit->object.flags.
Patch breakdown:
commit-reach: add find_reachable() and convert simple callers
Add the new batch reachability primitive and convert
repo_is_descendant_of, repo_in_merge_bases_many, and test-reach.
commit-reach: convert can_all_from_reach_with_flag to find_reachable_core
Replace the inline DFS in can_all_from_reach_with_flag() with a
delegation to find_reachable_core(). Delete can_all_from_reach().
ref-filter: batch --contains/--no-contains using find_reachable
Replace per-ref commit_contains() with batched find_reachable_list().
Delete contains_tag_algo and all supporting infrastructure.
Benchmarks on gitgitgadget/git (v2.48, ~85k commits, 370 branches,
730 tags), median of 5-10 sequential runs on a quiet machine:
The branch --contains speedup comes from the O(N*D)->O(D+N) batch
change. tag --contains is neutral because the old contains_tag_algo
already had per-commit slab caching. merge-base --is-ancestor is
neutral since the bottleneck is commit-graph object loading, not
the walk pattern.