Skip to content

fix: serve remapped row addresses from vector indexes during the FRI window#7261

Draft
xuanyu-z wants to merge 1 commit into
lance-format:mainfrom
xuanyu-z:xuanyuzhan/fix-vector-fri-auto-remap
Draft

fix: serve remapped row addresses from vector indexes during the FRI window#7261
xuanyu-z wants to merge 1 commit into
lance-format:mainfrom
xuanyu-z:xuanyuzhan/fix-vector-fri-auto-remap

Conversation

@xuanyu-z

Copy link
Copy Markdown
Contributor

Problem

After a deferred-remap compaction (defer_index_remap=true), vector searches can return stale pre-compaction row addresses, failing the subsequent take with fragment id N does not exist in the dataset (or, worse, silently reading wrong rows if fragment ids get reused). Scalar/FTS indexes auto-remap correctly through the FRI at load (#3971); vector indexes did not, due to two distinct bugs:

Bug 1 — IvfIndexState::reconstruct drops the frag reuse index

The index state cache key is FRI-aware (IvfIndexStateCacheKey::new(uuid, frag_reuse_uuid)), but the cached value cannot re-open the FRI at reconstruction time (no dataset access), and reconstruct_typed hardcoded None into IvfQuantizationStorage::from_cached. Every index opened through the state-cache hit path loaded partitions without auto-remap.

Fix: thread Option<Arc<FragReuseIndex>> through reconstructreconstruct_typedfrom_cached; the call site re-opens the FRI before reconstructing (cheap — the FRI itself is cached under FragReuseIndexKey).

Bug 2 — ProductQuantizationStorage::new keeps the pre-remap row_ids

The remap branch rebuilds the batch and refreshes pq_code from it, but Self { row_ids } stored the binding extracted before the remap — so storage.row_ids() served stale addresses paired with remapped codes, even on a cold open.

Fix: refresh row_ids from the remapped batch, exactly like pq_code.

Test

Adds test_vector_search_during_fri_window: 10-fragment dataset, IVF_FLAT and IVF_PQ, KNN before → deferred compaction → KNN twice on the same handle (first search = cold open, exercising Bug 2; second = state-cache reconstruct, exercising Bug 1), asserting results equal the FRI-translated pre-compaction results. Both cases fail without the fixes and pass with them. Existing FRI/straddle tests unaffected.

Notes

  • Found while building deferred-remap validation for LanceDB enterprise: per-index-type result-equivalence tests (results before compaction == during FRI window == after physical remap) caught IVF_FLAT/IVF_PQ/IVF_HNSW_SQ all serving stale addresses; bisecting through the partition caches and the sequential prepared search path isolated the two causes above.
  • HNSW auto-remap at load remains a known separate gap, tracked in [Remap Seaparation] Auto-remap for HNSW #3993 (physical remap rebuilds the graph per fix: rebuild HNSW graph while remapping it #4941).

…window

After a deferred-remap compaction (defer_index_remap=true), vector
searches could return stale pre-compaction row addresses, failing the
subsequent take with "fragment id N does not exist" (or silently reading
wrong rows). Two distinct bugs:

1. IvfIndexState::reconstruct dropped the frag reuse index. The index
   state cache key is FRI-aware, but the cached state cannot re-open the
   FRI at reconstruction time (no dataset access), and reconstruct_typed
   hardcoded None into IvfQuantizationStorage::from_cached. Every index
   opened through the state-cache hit path loaded partitions without
   auto-remap. Fixed by threading Option<Arc<FragReuseIndex>> through
   reconstruct -> reconstruct_typed -> from_cached; the call site
   re-opens the FRI, which is cheap because it is cached under
   FragReuseIndexKey.

2. ProductQuantizationStorage::new kept the pre-remap row_ids binding.
   The remap branch rebuilds the batch and refreshes pq_code from it,
   but Self { row_ids } stored the array extracted before the remap, so
   storage.row_ids() served stale addresses paired with remapped codes —
   even on a cold open. Fixed by refreshing row_ids from the remapped
   batch, like pq_code.

Adds test_vector_search_during_fri_window covering IVF_FLAT and IVF_PQ
through both the cold-open and state-cache reconstruct paths, asserting
search results equal the FRI-translated pre-compaction results.

Note: HNSW auto-remap at load remains a known gap, tracked in lance-format#3993.
@github-actions github-actions Bot added A-index Vector index, linalg, tokenizer bug Something isn't working labels Jun 12, 2026
@codecov

codecov Bot commented Jun 13, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 98.34711% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance/src/index.rs 99.12% 0 Missing and 1 partial ⚠️
rust/lance/src/index/vector/ivf/v2.rs 80.00% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@xuanyu-z xuanyu-z marked this pull request as draft June 13, 2026 00:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-index Vector index, linalg, tokenizer bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant