Propagate CUDA device IDs in Python DLPack tensors by fallintoplace · Pull Request #2239 · rapidsai/cuvs

fallintoplace · 2026-06-11T20:50:04Z

Summary

Set Python-created CUDA DLPack tensors to the actual allocation device instead of always using device 0.
Use cudaPointerGetAttributes() for CUDA array-interface inputs and keep host arrays on device ID 0.
Add regression coverage for host arrays and nonzero-device CuPy arrays on multi-GPU runners.

Why

The CUDA array interface does not carry a device ordinal, and the RAFT wrapper used here only preserves the interface dictionary. Hard-coding DLDevice.device_id = 0 mislabels arrays allocated on GPU 1 or higher, which can send downstream C API work through the wrong device context.

Validation

pre-commit run --files python/cuvs/cuvs/common/cydlpack.pyx python/cuvs/cuvs/tests/test_device_tensor_view.py
git diff --check
python3 -m compileall python/cuvs/cuvs/tests/test_device_tensor_view.py

I could not run the targeted pytest locally because this base Python environment does not have pytest installed; the new CuPy regression is skipped automatically on single-GPU machines.

copy-pr-bot · 2026-06-11T20:50:20Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-06-11T21:42:59Z

📝 Walkthrough

Summary by CodeRabbit

Bug Fixes
- Fixed DLPack tensor device identification to correctly report which CUDA device a tensor originates from, instead of hardcoding to device 0.
Tests
- Added multi-GPU DLPack device identification tests, including verification for host arrays and CUDA device arrays.

Walkthrough

This PR updates the DLPack export layer to correctly detect and report the CUDA device ID associated with tensor pointers, replacing hardcoded zero with computed device introspection. Implementation extends CUDA runtime imports and adds device-querying helpers; tests validate the detection across CPU and GPU allocations with multi-GPU awareness.

Changes

DLPack CUDA Device-ID Detection

Layer / File(s)	Summary
CUDA device-id querying in DLPack export `python/cuvs/cuvs/common/cydlpack.pyx`	Extends CUDA runtime imports to enable pointer attribute querying, introduces helper functions (`_dlpack_device_id_c`, `_dlpack_device_id`) to map pointers to CUDA devices with error handling, integrates computed device ID into `dlpack_c`, and removes redundant pointer declaration.
DLPack device-id test coverage `python/cuvs/cuvs/tests/test_device_tensor_view.py`	Adds `has_multiple_gpus()` utility and `requires_multiple_gpus` pytest marker for GPU-availability gating; introduces tests for device-id detection on host arrays (expected 0) and GPU arrays (matches active CUDA device).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely summarizes the main change: propagating actual CUDA device IDs in DLPack tensors instead of hardcoding device 0.
Description check	✅ Passed	The description is well-detailed and directly related to the changeset, explaining the motivation, implementation approach, and validation steps performed.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

python/cuvs/cuvs/common/cydlpack.pyx (1)

95-105: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

CRITICAL: avoid allocating DLManagedTensor before the fallible device query.

Line 97 allocates dlm, and Line 105 can now raise via _dlpack_device_id_c(ary). That exception path leaks the raw DLManagedTensor allocation because there is no cleanup before unwinding.

Suggested fix

 cdef DLManagedTensor* dlpack_c(ary):
     # todo(dgd): add checking options/parameters
     cdef DLDeviceType dev_type
     cdef DLDevice dev
     cdef DLDataType dtype
     cdef DLTensor tensor
     cdef uintptr_t tensor_ptr = <uintptr_t>ary.ai_["data"][0]
+    cdef int device_id = _dlpack_device_id_c(ary)
     cdef DLManagedTensor* dlm = \
         <DLManagedTensor*>stdlib.malloc(sizeof(DLManagedTensor))
 
     if ary.from_cai:
         dev_type = DLDeviceType.kDLCUDA
@@
     dev.device_type = dev_type
-    dev.device_id = _dlpack_device_id_c(ary)
+    dev.device_id = device_id

As per coding guidelines, "Catch memory leaks from improper resource management" and "Ensure GIL handling is correct for CUDA operations and exceptions are handled correctly across Python/C++ boundary in Cython bindings."

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@python/cuvs/cuvs/common/cydlpack.pyx` around lines 95 - 105, The
DLManagedTensor allocation (dlm via stdlib.malloc) is done before the fallible
device query (_dlpack_device_id_c(ary)), which can raise and leak dlm; move the
allocation of DLManagedTensor (dlm) until after you determine dev_type and
successfully call _dlpack_device_id_c(ary), or wrap the device-id call in a
try/except and free the malloc'ed dlm on error; specifically update the logic
around ary.from_cai, dev.device_type/dev.device_id and the DLManagedTensor
allocation/initialization so that dlm is only malloc'ed after
_dlpack_device_id_c(ary) returns successfully (or ensure you call
stdlib.free(dlm) before propagating any exception).

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@python/cuvs/cuvs/common/cydlpack.pyx`:
- Around line 95-105: The DLManagedTensor allocation (dlm via stdlib.malloc) is
done before the fallible device query (_dlpack_device_id_c(ary)), which can
raise and leak dlm; move the allocation of DLManagedTensor (dlm) until after you
determine dev_type and successfully call _dlpack_device_id_c(ary), or wrap the
device-id call in a try/except and free the malloc'ed dlm on error; specifically
update the logic around ary.from_cai, dev.device_type/dev.device_id and the
DLManagedTensor allocation/initialization so that dlm is only malloc'ed after
_dlpack_device_id_c(ary) returns successfully (or ensure you call
stdlib.free(dlm) before propagating any exception).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 356ff87a-af66-432c-a5bf-9316c0ad6c95

📥 Commits

Reviewing files that changed from the base of the PR and between 6672103 and 94bb439.

📒 Files selected for processing (2)

python/cuvs/cuvs/common/cydlpack.pyx
python/cuvs/cuvs/tests/test_device_tensor_view.py

Propagate CUDA device in Python DLPack tensors

94bb439

fallintoplace requested a review from a team as a code owner June 11, 2026 20:50

github-project-automation Bot added this to Unstructured Data Processing Jun 11, 2026

coderabbitai Bot reviewed Jun 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Propagate CUDA device IDs in Python DLPack tensors#2239

Propagate CUDA device IDs in Python DLPack tensors#2239
fallintoplace wants to merge 1 commit into
rapidsai:mainfrom
fallintoplace:fix-python-dlpack-device-id

fallintoplace commented Jun 11, 2026

Uh oh!

copy-pr-bot Bot commented Jun 11, 2026

Uh oh!

coderabbitai Bot commented Jun 11, 2026

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

fallintoplace commented Jun 11, 2026

Summary

Why

Validation

Uh oh!

copy-pr-bot Bot commented Jun 11, 2026

Uh oh!

coderabbitai Bot commented Jun 11, 2026

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant