fix: serve remapped row addresses from vector indexes during the FRI window#7261
Draft
xuanyu-z wants to merge 1 commit into
Draft
fix: serve remapped row addresses from vector indexes during the FRI window#7261xuanyu-z wants to merge 1 commit into
xuanyu-z wants to merge 1 commit into
Conversation
…window
After a deferred-remap compaction (defer_index_remap=true), vector
searches could return stale pre-compaction row addresses, failing the
subsequent take with "fragment id N does not exist" (or silently reading
wrong rows). Two distinct bugs:
1. IvfIndexState::reconstruct dropped the frag reuse index. The index
state cache key is FRI-aware, but the cached state cannot re-open the
FRI at reconstruction time (no dataset access), and reconstruct_typed
hardcoded None into IvfQuantizationStorage::from_cached. Every index
opened through the state-cache hit path loaded partitions without
auto-remap. Fixed by threading Option<Arc<FragReuseIndex>> through
reconstruct -> reconstruct_typed -> from_cached; the call site
re-opens the FRI, which is cheap because it is cached under
FragReuseIndexKey.
2. ProductQuantizationStorage::new kept the pre-remap row_ids binding.
The remap branch rebuilds the batch and refreshes pq_code from it,
but Self { row_ids } stored the array extracted before the remap, so
storage.row_ids() served stale addresses paired with remapped codes —
even on a cold open. Fixed by refreshing row_ids from the remapped
batch, like pq_code.
Adds test_vector_search_during_fri_window covering IVF_FLAT and IVF_PQ
through both the cold-open and state-cache reconstruct paths, asserting
search results equal the FRI-translated pre-compaction results.
Note: HNSW auto-remap at load remains a known gap, tracked in lance-format#3993.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
After a deferred-remap compaction (
defer_index_remap=true), vector searches can return stale pre-compaction row addresses, failing the subsequent take withfragment id N does not exist in the dataset(or, worse, silently reading wrong rows if fragment ids get reused). Scalar/FTS indexes auto-remap correctly through the FRI at load (#3971); vector indexes did not, due to two distinct bugs:Bug 1 —
IvfIndexState::reconstructdrops the frag reuse indexThe index state cache key is FRI-aware (
IvfIndexStateCacheKey::new(uuid, frag_reuse_uuid)), but the cached value cannot re-open the FRI at reconstruction time (no dataset access), andreconstruct_typedhardcodedNoneintoIvfQuantizationStorage::from_cached. Every index opened through the state-cache hit path loaded partitions without auto-remap.Fix: thread
Option<Arc<FragReuseIndex>>throughreconstruct→reconstruct_typed→from_cached; the call site re-opens the FRI before reconstructing (cheap — the FRI itself is cached underFragReuseIndexKey).Bug 2 —
ProductQuantizationStorage::newkeeps the pre-remaprow_idsThe remap branch rebuilds the batch and refreshes
pq_codefrom it, butSelf { row_ids }stored the binding extracted before the remap — sostorage.row_ids()served stale addresses paired with remapped codes, even on a cold open.Fix: refresh
row_idsfrom the remapped batch, exactly likepq_code.Test
Adds
test_vector_search_during_fri_window: 10-fragment dataset, IVF_FLAT and IVF_PQ, KNN before → deferred compaction → KNN twice on the same handle (first search = cold open, exercising Bug 2; second = state-cache reconstruct, exercising Bug 1), asserting results equal the FRI-translated pre-compaction results. Both cases fail without the fixes and pass with them. Existing FRI/straddle tests unaffected.Notes