Skip to content

Revert "[Compiled Graph] Enhance Compile Graph with Multi-Device Support (#51032)"#53263

Merged
stephanie-wang merged 1 commit intoray-project:masterfrom
edoakes:eoakes/revert-cgraph
May 23, 2025
Merged

Revert "[Compiled Graph] Enhance Compile Graph with Multi-Device Support (#51032)"#53263
stephanie-wang merged 1 commit intoray-project:masterfrom
edoakes:eoakes/revert-cgraph

Conversation

@edoakes
Copy link
Copy Markdown
Collaborator

@edoakes edoakes commented May 23, 2025

This reverts commit 2c7f6d4.

test_torch_tensor_transport_gpu is failing on postmerge. It appears this test does not run on premerge.

…ort (ray-project#51032)"

This reverts commit 2c7f6d4.

Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
@edoakes edoakes force-pushed the eoakes/revert-cgraph branch from 9a7e03b to 5362052 Compare May 23, 2025 13:29
@edoakes edoakes added the go add ONLY when ready to merge, run all tests label May 23, 2025
@edoakes edoakes requested review from dayshah, ruisearch42 and stephanie-wang and removed request for ruisearch42 and stephanie-wang May 23, 2025 13:32
Copy link
Copy Markdown
Contributor

@ruisearch42 ruisearch42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I triggered gpu tests to confirm this is the issue

@stephanie-wang
Copy link
Copy Markdown
Contributor

I triggered gpu tests to confirm this is the issue

Ah, thanks for catching.

@stephanie-wang stephanie-wang enabled auto-merge (squash) May 23, 2025 15:16
@stephanie-wang
Copy link
Copy Markdown
Contributor

FYI, @hipudding

@stephanie-wang stephanie-wang merged commit 21275c4 into ray-project:master May 23, 2025
6 of 7 checks passed
@hipudding
Copy link
Copy Markdown
Contributor

hipudding commented May 26, 2025

I’m very sorry for the trouble caused by my previous submission. I originally thought that passing CI would ensure all necessary test cases were executed. In future development, I will make sure to run all relevant test cases locally beforehand to prevent similar issues from being merged upstream.

The issue occurred in the deserialize_from_numpy_or_scalar function. Even when with_tensor_transport explicitly specifies CUDA, the function still automatically selects the currently available accelerator. In this particular scenario, since no GPU was available, it fell back to using a CPU tensor. I have fixed this issue locally and confirmed that the test_torch_tensor_transport_gpu test case now passes.

I don’t have the necessary permissions to view the postmerge results. May I ask if there are any other test cases that have failed?

What should I do next? Should I create a new PR after fix this issue? @stephanie-wang

image

@ruisearch42
Copy link
Copy Markdown
Contributor

Hi @hipudding , no worries, thanks for following up on this!

You can create a new PR which contains your original changes and the new fix. The GPU tests need to triggered manually in CI, and we can help you with that. But you can also test them locally, e.g., test_torch_tensor_dag.py, test_torch_tensor_transport. Set RAY_PYTEST_USE_GPU to true when you test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants