[core] Fix test_torch_tensor_transport expecting CUDA_VISIBLE_DEVICES… by elliot-barn · Pull Request #62653 · ray-project/ray

elliot-barn · 2026-04-16T03:52:45Z

… scrubbing on num_gpus=0 actors

#62492 flipped the default of RAY_ACCEL_ENV_VAR_OVERRIDE_ON_ZERO from True to False, so Ray no longer overrides CUDA_VISIBLE_DEVICES for actors with num_gpus=0. 11 test cases in test_torch_tensor_transport.py relied on the old behavior where bare Actor.remote() workers would have CUDA_VISIBLE_DEVICES="" set, causing torch to raise "No CUDA GPUs are available" on .to("cuda").

Adds a per-test fixture that sets RAY_ACCEL_ENV_VAR_OVERRIDE_ON_ZERO=1 via monkeypatch before ray_start_regular boots Ray, restoring the old behavior for just the affected tests. No production code is changed.

failing postmerge tests: https://buildkite.com/ray-project/postmerge/builds/17053

successful postmerge run: https://buildkite.com/ray-project/postmerge/builds/17060

… scrubbing on num_gpus=0 actors #62492 flipped the default of RAY_ACCEL_ENV_VAR_OVERRIDE_ON_ZERO from True to False, so Ray no longer overrides CUDA_VISIBLE_DEVICES for actors with num_gpus=0. 11 test cases in test_torch_tensor_transport.py relied on the old behavior where bare Actor.remote() workers would have CUDA_VISIBLE_DEVICES="" set, causing torch to raise "No CUDA GPUs are available" on .to("cuda"). Adds a per-test fixture that sets RAY_ACCEL_ENV_VAR_OVERRIDE_ON_ZERO=1 via monkeypatch before ray_start_regular boots Ray, restoring the old behavior for just the affected tests. No production code is changed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>

gemini-code-assist

Code Review

This pull request introduces a new pytest fixture, override_accelerator_env_on_zero, to the test_torch_tensor_transport.py file. This fixture sets the RAY_ACCEL_ENV_VAR_OVERRIDE_ON_ZERO environment variable to restore legacy behavior where Ray clears CUDA_VISIBLE_DEVICES for actors without assigned GPUs, ensuring that tests expecting CUDA unavailability errors on CPU-only actors function correctly. Multiple test cases have been updated to include this fixture. I have no feedback to provide.

Sparks0219

TBH I think it would be better to test the default RAY_ACCEL_ENV_VAR_OVERRIDE_ON_ZERO since that's what users would actually use, however this is also a cgraph test so I don't have a strong opinion

edoakes · 2026-04-16T21:27:55Z

TBH I think it would be better to test the default RAY_ACCEL_ENV_VAR_OVERRIDE_ON_ZERO since that's what users would actually use, however this is also a cgraph test so I don't have a strong opinion

yeah let's just do this

After #62492 we no longer set CUDA_VISIBLE_DEVIES ="" when num_gpus=0 or not set. Torch if it detects that CUDA_VISIBLE_DEVIES ="" throws a runtime error, however now that CUDA_VISIBLE_DEVIES is not set at all it falls back to the nvidia driver to get the device ids. Following up on #62653 and instead checking for the default cuda:0 gpu id in these tests. --------- Signed-off-by: Joshua Lee <joshlee@anyscale.com>

After ray-project#62492 we no longer set CUDA_VISIBLE_DEVIES ="" when num_gpus=0 or not set. Torch if it detects that CUDA_VISIBLE_DEVIES ="" throws a runtime error, however now that CUDA_VISIBLE_DEVIES is not set at all it falls back to the nvidia driver to get the device ids. Following up on ray-project#62653 and instead checking for the default cuda:0 gpu id in these tests. --------- Signed-off-by: Joshua Lee <joshlee@anyscale.com>

elliot-barn requested a review from a team as a code owner April 16, 2026 03:52

elliot-barn mentioned this pull request Apr 16, 2026

[deps] Add depsets for core CI builds #62514

Merged

elliot-barn force-pushed the override_accelerator_env_in_test_torch_tensor_transport branch from bcbcd3e to f1f2c05 Compare April 16, 2026 03:54

gemini-code-assist Bot reviewed Apr 16, 2026

View reviewed changes

ray-gardener Bot added the core Issues that should be addressed in Ray Core label Apr 16, 2026

Sparks0219 approved these changes Apr 16, 2026

View reviewed changes

edoakes added the go add ONLY when ready to merge, run all tests label Apr 16, 2026

elliot-barn closed this Apr 17, 2026

Sparks0219 mentioned this pull request Apr 17, 2026

[core] Deflake torch tensor transport test #62743

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[core] Fix test_torch_tensor_transport expecting CUDA_VISIBLE_DEVICES…#62653

[core] Fix test_torch_tensor_transport expecting CUDA_VISIBLE_DEVICES…#62653
elliot-barn wants to merge 1 commit intomasterfrom
override_accelerator_env_in_test_torch_tensor_transport

elliot-barn commented Apr 16, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Sparks0219 left a comment

Uh oh!

edoakes commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

elliot-barn commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Sparks0219 left a comment

Choose a reason for hiding this comment

Uh oh!

edoakes commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

elliot-barn commented Apr 16, 2026 •

edited

Loading