CAGRA: fix concurrent initialization and usage of dataset descriptor#2237
CAGRA: fix concurrent initialization and usage of dataset descriptor#2237achirkin wants to merge 3 commits into
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
📝 WalkthroughSummary by CodeRabbit
WalkthroughThe change adds a ChangesCross-stream device descriptor synchronization
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
When CAGRA index runs search in multiple independent streams using the same
raft::resourceshandle, it could happen that the dataset descriptor kernel in one stream finishes later than its result is used in CAGRA search in another stream.Currently, we protect against the concurrent initialization on the host only. The PR adds stream ordering to make the search kernel wait for the initialization on the device side.
Note, this is all singe-device concurrency; the dataset descriptors are not shared between GPUs, because they are cached in
raft::resourcescustom resource, and we enforce one-resources-handle-per-device.Possibly related bugs: #1720, https://github.com/rapidsai/dlfw/issues/286