[core][gpu-objects] Support intra-process communication by kevin85421 · Pull Request #53798 · ray-project/ray

kevin85421 · 2025-06-13T09:01:59Z

Why are these changes needed?

If we pass GPU object refs within the same actor, NCCL send/recv will block indefinitely and the transfer is also unnecessary. This PR allows intra-process communication to retrieve tensors directly from the in-process actor store.

Example:

small_tensor = torch.randn((1,))

# Intra-actor communication for pure GPU tensors
ref = actor.echo.remote(small_tensor)
result = actor.double.remote(ref)
assert ray.get(result) == pytest.approx(small_tensor * 2)

Related issue number

Closes #51685

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com>

Copilot

Pull Request Overview

This PR enables intra-process GPU tensor communication by bypassing unnecessary out-of-band transfers and adds tests to validate that behavior.

Skip NCCL-style transfers when source and destination ranks match, allowing direct in-process tensor passing.
Introduce test_intra_gpu_tensor_transfer to cover pure GPU, mixed CPU/GPU, and large-tensor intra-process transfers.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
python/ray/_private/gpu_object_manager.py	Replace exception on same-rank transfers with a no-op `continue`
python/ray/tests/test_gpu_objects.py	Add `test_intra_gpu_tensor_transfer` for various intra-process scenarios

Comments suppressed due to low confidence (2)

python/ray/tests/test_gpu_objects.py:70

[nitpick] Consider adding a test scenario with multiple actors in the same process group to ensure the skip logic works correctly when more than one actor shares the same rank.

def test_intra_gpu_tensor_transfer(ray_start_regular):

python/ray/tests/test_gpu_objects.py:82

The random module is used here but not imported—add import random at the top of the file to avoid a NameError.

cpu_data = random.randint(0, 100)

python/ray/_private/gpu_object_manager.py

Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com>

stephanie-wang

Could you add a description to the PR?

kevin85421 · 2025-06-13T21:43:14Z

Could you add a description to the PR?

Done

kevin85421 · 2025-06-13T22:36:10Z

test_runtime_env_container is not relevant to this PR, and it has been consistently failing recently.

kevin85421 · 2025-06-16T06:23:50Z

@stephanie-wang CI passes!

kevin85421 · 2025-06-16T17:21:38Z

cc @edoakes would you mind merging this PR? Thanks!

If we pass GPU object refs within the same actor, NCCL send/recv will block indefinitely and the transfer is also unnecessary. This PR allows intra-process communication to retrieve tensors directly from the in-process actor store. Example: ``` small_tensor = torch.randn((1,)) # Intra-actor communication for pure GPU tensors ref = actor.echo.remote(small_tensor) result = actor.double.remote(ref) assert ray.get(result) == pytest.approx(small_tensor * 2) ``` ## Related issue number Closes #51685 Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>

…53798) If we pass GPU object refs within the same actor, NCCL send/recv will block indefinitely and the transfer is also unnecessary. This PR allows intra-process communication to retrieve tensors directly from the in-process actor store. Example: ``` small_tensor = torch.randn((1,)) # Intra-actor communication for pure GPU tensors ref = actor.echo.remote(small_tensor) result = actor.double.remote(ref) assert ray.get(result) == pytest.approx(small_tensor * 2) ``` ## Related issue number Closes ray-project#51685 Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com>

If we pass GPU object refs within the same actor, NCCL send/recv will block indefinitely and the transfer is also unnecessary. This PR allows intra-process communication to retrieve tensors directly from the in-process actor store. Example: ``` small_tensor = torch.randn((1,)) # Intra-actor communication for pure GPU tensors ref = actor.echo.remote(small_tensor) result = actor.double.remote(ref) assert ray.get(result) == pytest.approx(small_tensor * 2) ``` ## Related issue number Closes #51685 Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>

update

e16a700

Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com>

kevin85421 added the go add ONLY when ready to merge, run all tests label Jun 13, 2025

kevin85421 added 2 commits June 13, 2025 21:12

update

3d10286

Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com>

Merge branch 'master' into 20250613-devbox-2-tmux10-ray

98162e9

kevin85421 marked this pull request as ready for review June 13, 2025 21:23

Copilot AI review requested due to automatic review settings June 13, 2025 21:23

Copilot AI reviewed Jun 13, 2025

View reviewed changes

python/ray/_private/gpu_object_manager.py Show resolved Hide resolved

update

9cb9b49

Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com>

stephanie-wang approved these changes Jun 13, 2025

View reviewed changes

kevin85421 added 2 commits June 13, 2025 15:37

Merge branch 'master' into 20250613-devbox-2-tmux10-ray

0abce53

Merge branch 'master' into 20250613-devbox-2-tmux10-ray

26aab80

edoakes merged commit 8c10308 into ray-project:master Jun 16, 2025
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[core][gpu-objects] Support intra-process communication#53798

[core][gpu-objects] Support intra-process communication#53798
edoakes merged 6 commits intoray-project:masterfrom
kevin85421:20250613-devbox-2-tmux10-ray

kevin85421 commented Jun 13, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

stephanie-wang left a comment

Uh oh!

kevin85421 commented Jun 13, 2025

Uh oh!

kevin85421 commented Jun 13, 2025

Uh oh!

kevin85421 commented Jun 16, 2025

Uh oh!

kevin85421 commented Jun 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

kevin85421 commented Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why are these changes needed?

Related issue number

Checks

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

stephanie-wang left a comment

Choose a reason for hiding this comment

Uh oh!

kevin85421 commented Jun 13, 2025

Uh oh!

kevin85421 commented Jun 13, 2025

Uh oh!

kevin85421 commented Jun 16, 2025

Uh oh!

kevin85421 commented Jun 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kevin85421 commented Jun 13, 2025 •

edited

Loading