Skip to content

Propagate CUDA device IDs in Python DLPack tensors#2239

Open
fallintoplace wants to merge 1 commit into
rapidsai:mainfrom
fallintoplace:fix-python-dlpack-device-id
Open

Propagate CUDA device IDs in Python DLPack tensors#2239
fallintoplace wants to merge 1 commit into
rapidsai:mainfrom
fallintoplace:fix-python-dlpack-device-id

Conversation

@fallintoplace

Copy link
Copy Markdown

Summary

  • Set Python-created CUDA DLPack tensors to the actual allocation device instead of always using device 0.
  • Use cudaPointerGetAttributes() for CUDA array-interface inputs and keep host arrays on device ID 0.
  • Add regression coverage for host arrays and nonzero-device CuPy arrays on multi-GPU runners.

Why

The CUDA array interface does not carry a device ordinal, and the RAFT wrapper used here only preserves the interface dictionary. Hard-coding DLDevice.device_id = 0 mislabels arrays allocated on GPU 1 or higher, which can send downstream C API work through the wrong device context.

Validation

  • pre-commit run --files python/cuvs/cuvs/common/cydlpack.pyx python/cuvs/cuvs/tests/test_device_tensor_view.py
  • git diff --check
  • python3 -m compileall python/cuvs/cuvs/tests/test_device_tensor_view.py

I could not run the targeted pytest locally because this base Python environment does not have pytest installed; the new CuPy regression is skipped automatically on single-GPU machines.

@copy-pr-bot

copy-pr-bot Bot commented Jun 11, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Summary by CodeRabbit

  • Bug Fixes

    • Fixed DLPack tensor device identification to correctly report which CUDA device a tensor originates from, instead of hardcoding to device 0.
  • Tests

    • Added multi-GPU DLPack device identification tests, including verification for host arrays and CUDA device arrays.

Walkthrough

This PR updates the DLPack export layer to correctly detect and report the CUDA device ID associated with tensor pointers, replacing hardcoded zero with computed device introspection. Implementation extends CUDA runtime imports and adds device-querying helpers; tests validate the detection across CPU and GPU allocations with multi-GPU awareness.

Changes

DLPack CUDA Device-ID Detection

Layer / File(s) Summary
CUDA device-id querying in DLPack export
python/cuvs/cuvs/common/cydlpack.pyx
Extends CUDA runtime imports to enable pointer attribute querying, introduces helper functions (_dlpack_device_id_c, _dlpack_device_id) to map pointers to CUDA devices with error handling, integrates computed device ID into dlpack_c, and removes redundant pointer declaration.
DLPack device-id test coverage
python/cuvs/cuvs/tests/test_device_tensor_view.py
Adds has_multiple_gpus() utility and requires_multiple_gpus pytest marker for GPU-availability gating; introduces tests for device-id detection on host arrays (expected 0) and GPU arrays (matches active CUDA device).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: propagating actual CUDA device IDs in DLPack tensors instead of hardcoding device 0.
Description check ✅ Passed The description is well-detailed and directly related to the changeset, explaining the motivation, implementation approach, and validation steps performed.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
python/cuvs/cuvs/common/cydlpack.pyx (1)

95-105: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

CRITICAL: avoid allocating DLManagedTensor before the fallible device query.

Line 97 allocates dlm, and Line 105 can now raise via _dlpack_device_id_c(ary). That exception path leaks the raw DLManagedTensor allocation because there is no cleanup before unwinding.

Suggested fix
 cdef DLManagedTensor* dlpack_c(ary):
     # todo(dgd): add checking options/parameters
     cdef DLDeviceType dev_type
     cdef DLDevice dev
     cdef DLDataType dtype
     cdef DLTensor tensor
     cdef uintptr_t tensor_ptr = <uintptr_t>ary.ai_["data"][0]
+    cdef int device_id = _dlpack_device_id_c(ary)
     cdef DLManagedTensor* dlm = \
         <DLManagedTensor*>stdlib.malloc(sizeof(DLManagedTensor))
 
     if ary.from_cai:
         dev_type = DLDeviceType.kDLCUDA
@@
     dev.device_type = dev_type
-    dev.device_id = _dlpack_device_id_c(ary)
+    dev.device_id = device_id

As per coding guidelines, "Catch memory leaks from improper resource management" and "Ensure GIL handling is correct for CUDA operations and exceptions are handled correctly across Python/C++ boundary in Cython bindings."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@python/cuvs/cuvs/common/cydlpack.pyx` around lines 95 - 105, The
DLManagedTensor allocation (dlm via stdlib.malloc) is done before the fallible
device query (_dlpack_device_id_c(ary)), which can raise and leak dlm; move the
allocation of DLManagedTensor (dlm) until after you determine dev_type and
successfully call _dlpack_device_id_c(ary), or wrap the device-id call in a
try/except and free the malloc'ed dlm on error; specifically update the logic
around ary.from_cai, dev.device_type/dev.device_id and the DLManagedTensor
allocation/initialization so that dlm is only malloc'ed after
_dlpack_device_id_c(ary) returns successfully (or ensure you call
stdlib.free(dlm) before propagating any exception).

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@python/cuvs/cuvs/common/cydlpack.pyx`:
- Around line 95-105: The DLManagedTensor allocation (dlm via stdlib.malloc) is
done before the fallible device query (_dlpack_device_id_c(ary)), which can
raise and leak dlm; move the allocation of DLManagedTensor (dlm) until after you
determine dev_type and successfully call _dlpack_device_id_c(ary), or wrap the
device-id call in a try/except and free the malloc'ed dlm on error; specifically
update the logic around ary.from_cai, dev.device_type/dev.device_id and the
DLManagedTensor allocation/initialization so that dlm is only malloc'ed after
_dlpack_device_id_c(ary) returns successfully (or ensure you call
stdlib.free(dlm) before propagating any exception).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 356ff87a-af66-432c-a5bf-9316c0ad6c95

📥 Commits

Reviewing files that changed from the base of the PR and between 6672103 and 94bb439.

📒 Files selected for processing (2)
  • python/cuvs/cuvs/common/cydlpack.pyx
  • python/cuvs/cuvs/tests/test_device_tensor_view.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant