Skip to content

[core][gpu-objects] Support intra-process communication#53798

Merged
edoakes merged 6 commits intoray-project:masterfrom
kevin85421:20250613-devbox-2-tmux10-ray
Jun 16, 2025
Merged

[core][gpu-objects] Support intra-process communication#53798
edoakes merged 6 commits intoray-project:masterfrom
kevin85421:20250613-devbox-2-tmux10-ray

Conversation

@kevin85421
Copy link
Copy Markdown
Member

@kevin85421 kevin85421 commented Jun 13, 2025

Why are these changes needed?

If we pass GPU object refs within the same actor, NCCL send/recv will block indefinitely and the transfer is also unnecessary. This PR allows intra-process communication to retrieve tensors directly from the in-process actor store.

Example:

small_tensor = torch.randn((1,))

# Intra-actor communication for pure GPU tensors
ref = actor.echo.remote(small_tensor)
result = actor.double.remote(ref)
assert ray.get(result) == pytest.approx(small_tensor * 2)

Related issue number

Closes #51685

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com>
@kevin85421 kevin85421 added the go add ONLY when ready to merge, run all tests label Jun 13, 2025
Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com>
@kevin85421 kevin85421 marked this pull request as ready for review June 13, 2025 21:23
Copilot AI review requested due to automatic review settings June 13, 2025 21:23
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enables intra-process GPU tensor communication by bypassing unnecessary out-of-band transfers and adds tests to validate that behavior.

  • Skip NCCL-style transfers when source and destination ranks match, allowing direct in-process tensor passing.
  • Introduce test_intra_gpu_tensor_transfer to cover pure GPU, mixed CPU/GPU, and large-tensor intra-process transfers.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
python/ray/_private/gpu_object_manager.py Replace exception on same-rank transfers with a no-op continue
python/ray/tests/test_gpu_objects.py Add test_intra_gpu_tensor_transfer for various intra-process scenarios
Comments suppressed due to low confidence (2)

python/ray/tests/test_gpu_objects.py:70

  • [nitpick] Consider adding a test scenario with multiple actors in the same process group to ensure the skip logic works correctly when more than one actor shares the same rank.
def test_intra_gpu_tensor_transfer(ray_start_regular):

python/ray/tests/test_gpu_objects.py:82

  • The random module is used here but not imported—add import random at the top of the file to avoid a NameError.
cpu_data = random.randint(0, 100)

Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com>
Copy link
Copy Markdown
Contributor

@stephanie-wang stephanie-wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a description to the PR?

@kevin85421
Copy link
Copy Markdown
Member Author

Could you add a description to the PR?

Done

@kevin85421
Copy link
Copy Markdown
Member Author

test_runtime_env_container is not relevant to this PR, and it has been consistently failing recently.

image

@kevin85421
Copy link
Copy Markdown
Member Author

@stephanie-wang CI passes!

@kevin85421
Copy link
Copy Markdown
Member Author

cc @edoakes would you mind merging this PR? Thanks!

@edoakes edoakes merged commit 8c10308 into ray-project:master Jun 16, 2025
5 checks passed
elliot-barn pushed a commit that referenced this pull request Jun 18, 2025
If we pass GPU object refs within the same actor, NCCL send/recv will
block indefinitely and the transfer is also unnecessary. This PR allows
intra-process communication to retrieve tensors directly from the
in-process actor store.

Example:

```
small_tensor = torch.randn((1,))

# Intra-actor communication for pure GPU tensors
ref = actor.echo.remote(small_tensor)
result = actor.double.remote(ref)
assert ray.get(result) == pytest.approx(small_tensor * 2)
```

## Related issue number

Closes #51685

Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
minerharry pushed a commit to minerharry/ray that referenced this pull request Jun 27, 2025
…53798)

If we pass GPU object refs within the same actor, NCCL send/recv will
block indefinitely and the transfer is also unnecessary. This PR allows
intra-process communication to retrieve tensors directly from the
in-process actor store.

Example:

```
small_tensor = torch.randn((1,))

# Intra-actor communication for pure GPU tensors
ref = actor.echo.remote(small_tensor)
result = actor.double.remote(ref)
assert ray.get(result) == pytest.approx(small_tensor * 2)
```

## Related issue number

Closes ray-project#51685 

Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com>
elliot-barn pushed a commit that referenced this pull request Jul 2, 2025
If we pass GPU object refs within the same actor, NCCL send/recv will
block indefinitely and the transfer is also unnecessary. This PR allows
intra-process communication to retrieve tensors directly from the
in-process actor store.

Example:

```
small_tensor = torch.randn((1,))

# Intra-actor communication for pure GPU tensors
ref = actor.echo.remote(small_tensor)
result = actor.double.remote(ref)
assert ray.get(result) == pytest.approx(small_tensor * 2)
```

## Related issue number

Closes #51685

Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[core][gpu-objects] intra-process communication

4 participants