[FEA] Roaring-bitmap prefilters for brute-force search (shared and per-query)#2240
Draft
maxwbuckley wants to merge 1 commit into
Draft
[FEA] Roaring-bitmap prefilters for brute-force search (shared and per-query)#2240maxwbuckley wants to merge 1 commit into
maxwbuckley wants to merge 1 commit into
Conversation
…query)
Adds cuvs::neighbors::filtering::roaring_filter (counterpart of
bitset_filter) and roaring_matrix_filter (counterpart of bitmap_filter)
over a cuvs::core::gpu_roaring compressed filter type, with a
selectivity-driven three-regime brute-force dispatch:
- very sparse: container ids -> CSR (indptr free from host-side
cardinalities, one emission kernel, no count syncs) -> sddmm ->
sparse select_k
- mid (shared filter): gather selected rows -> dense GEMM -> select_k
-> id remap (computes |filter| columns instead of n_rows)
- dense (s >= 0.45): decompress and delegate to the existing
bitset/bitmap pipeline
Sparse/mid threshold is dimension-dependent (~3% at d=128, ~0.1% at
d=512, measured on RTX 5090). Measured vs the stock paths: 11-19x for
the shared filter at 0.1-10% selectivity (10Mx512d, Q=64), 2.8-6.6x for
per-query filters at s<=1% (10Mx128d, Q=256), parity at >=50% via
delegation; recall 1.0 everywhere. Filter memory scales with filter
cardinality/structure (3KB-64KB typical at 10M rows vs a fixed 1.25MB
bitset; the dense [n_queries, n_rows] bit matrix is never materialized
below the dense regime).
Tests: 20 parameterized gtest cases cross-validating both filters
against bitset_filter/bitmap_filter across IP/L2/Cosine, d in {64,512},
and all three dispatch regimes.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
[FEA] Roaring-bitmap prefilters for brute-force search (shared and per-query)
Closes #1972.
Problem
cuVS encodes search prefilters as flat bit arrays:
bitset_filter(
n_rowsbits shared by all queries) andbitmap_filter(
n_queries x n_rowsbits, one row per query). Below ~40-50%selectivity — the dominant production regime (tag predicates, ACLs,
recency cutoffs) — this leaves performance and memory on the table:
n_rows/8bytes regardless of how few bits are set(
bitmap_filter: timesn_queries; 125 GB atn_queries=1000,n_rows=1e9).count()reduction + host sync per searchto pick a branch, then converts bitmap→CSR twice (once in cuVS,
once again inside
raft::sparse::linalg::masked_matmul).mid-selectivity regime (1-30%) runs either a full-dataset GEMM + mask
or a structure-blind SDDMM, both far from optimal.
What this PR adds
A compressed Roaring-bitmap filter representation plus a
selectivity-driven three-regime dispatch for brute-force search.
API (mirrors the existing filter API)
cuvs::core::gpu_roaringis RAII (rmm-backed), movable, and alsoprovides
set_and/set_or/multi_and/multi_orfor composing predicatebitmaps on-GPU and
to_bitset()for interop with the existing filters.Dispatch (no count kernels — cardinalities are known on the host)
sddmm→ sparseselect_kt_sparseis dimension-dependent because cusparse SDDMM degrades withdimwhile dense GEMM does not: measured crossovers are ~3% at d=128and ~0.1% at d=512 (RTX 5090); encoded as
dim >= 256 ? 0.001 : 0.03.Measured (RTX 5090, fp32 IP, k=10, recall 1.0 in every cell)
Shared filter, 10M x 512d, 64 queries, vs
bitset_filtertoday:Per-query filters, 10M x 128d, 256 queries, vs
bitmap_filtertoday:2.8-6.6x at s ≤ 1% (and 1.4-3.1x across 1-6.25%), with the dense
[n_queries, n_rows]bitmap never materialized below the dense regime.Filter memory at 10M rows: bitset fixed 1.25 MB; roaring 3 KB at 0.01%
uniform, 2-64 KB clustered (the common production shape), parity only
when genuinely dense.
Tests
cpp/tests/neighbors/brute_force_roaring.cu(added to NEIGHBORS_TEST):20 parameterized cases cross-validating
roaring_filteragainstbitset_filterandroaring_matrix_filteragainstbitmap_filter(tie-tolerant distance comparison + filter-membership assertions) across
InnerProduct / L2Expanded / CosineExpanded, d ∈ {64, 512}, and
selectivities exercising all three dispatch regimes. All pass.
Limitations / follow-ups
limitation); half/int8 follow-up.
from_sorted_idsbuilds containers host-side (construction isone-time per filter; a device-side builder is a follow-up).
natural follow-up — the format is container-compatible.
by construction.
selectivity) is a separate follow-up PR.
Files
New:
cpp/include/cuvs/core/roaring.hpp—gpu_roaring,roaring_view(device
test()), construction, set ops, decompression, CSR emissioncpp/src/core/roaring/roaring.cu— implementationcpp/include/cuvs/neighbors/roaring_filter.hpp— the two filterscpp/src/neighbors/detail/knn_brute_force_roaring.cuh— dispatch +the three pipelines
cpp/tests/neighbors/brute_force_roaring.cuModified:
cpp/include/cuvs/neighbors/common.hpp—FilterType::{Roaring, RoaringMatrix}cpp/src/neighbors/detail/knn_brute_force.cuh— include + twodynamic_cast dispatch cases in
detail::search; default argument ofbrute_force_search_filteredmoved to its first declarationcpp/CMakeLists.txt,cpp/tests/CMakeLists.txt— source/testregistration
🤖 Generated with Claude Code