[core] Remove override accelerator warning and change default behavior by Sparks0219 · Pull Request #62492 · ray-project/ray

Sparks0219 · 2026-04-09T23:49:01Z

Following up on #54928 where we originally introduced a feature flag to give users the option to not set CUDA_VISIBLE_DEVICES when num_gpus=0 or None. We also output an error informing users that the default behavior will be changed in a future ray version. Since it's been around 8 months since we introduced this feature flag and the error is a bit distracting, we're now setting this as the default behavior meaning we will no longer override CUDA_VISIBLE_DEVICES when num_gpus = 0 or None.

Signed-off-by: Joshua Lee <joshlee@anyscale.com>

gemini-code-assist

Code Review

This pull request modifies Ray's behavior to prevent the overriding of accelerator environment variables, such as CUDA_VISIBLE_DEVICES, when zero accelerators are allocated. Key changes include setting the default value of RAY_ACCEL_ENV_VAR_OVERRIDE_ON_ZERO to False, removing the corresponding FutureWarning, and updating test cases to reflect this new default behavior. A review comment suggests improving the robustness of the tests by explicitly setting and asserting the preservation of environment variables to ensure they are not being cleared or modified during initialization.

gemini-code-assist · 2026-04-09T23:52:30Z

-            **{"RAY_ACCEL_ENV_VAR_OVERRIDE_ON_ZERO": "0"},
-        ),
-    )
+    run_string_as_driver(not_override_check_script)


The test for the new default behavior in not_override_check_script could be more robust. It currently asserts that CUDA_VISIBLE_DEVICES is not set, which relies on the assumption that it's not set in the test execution environment.

A stronger test would be to explicitly set CUDA_VISIBLE_DEVICES to a specific value before ray.init() and then assert that this value is preserved within the remote task/actor. This would more accurately verify that the environment variable is not being overridden when num_gpus=0.

Here's a suggested improvement for not_override_check_script:

not_override_check_script = """ import os import ray # Set a specific value to check for preservation os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2" ray.init() @ray.remote(num_gpus=0) def check(): import os assert os.environ.get("CUDA_VISIBLE_DEVICES") == "0,1,2" @ray.remote(num_gpus=0) class Actor: def check(self): import os assert os.environ.get("CUDA_VISIBLE_DEVICES") == "0,1,2" print("task check", ray.get(check.remote())) print("actor check", ray.get(Actor.options(num_gpus=0).remote().check.remote())) """

This change would make the test more explicit and less dependent on the environment configuration.

edoakes · 2026-04-10T18:31:09Z

@Sparks0219 some relevant test failures

…celerator-override-warning-and-switch-default-behavior

Signed-off-by: Joshua Lee <joshlee@anyscale.com>

…celerator-override-warning-and-switch-default-behavior

edoakes · 2026-04-12T02:20:01Z

many failing tests

Sparks0219 · 2026-04-12T03:53:11Z

many failing tests

the remaining ones are due to some java_plugin thing and not related, I think premerge is broken right now 😪

…g-and-switch-default-behavior

…celerator-override-warning-and-switch-default-behavior

Signed-off-by: Joshua Lee <joshlee@anyscale.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Reviewed by Cursor Bugbot for commit 4b7a979. Configure here.}

Signed-off-by: Joshua Lee <joshlee@anyscale.com>

ray-project#62492) Following up on ray-project#54928 where we originally introduced a feature flag to give users the option to not set CUDA_VISIBLE_DEVICES when num_gpus=0 or None. We also output an error informing users that the default behavior will be changed in a future ray version. Since it's been around 8 months since we introduced this feature flag and the error is a bit distracting, we're now setting this as the default behavior meaning we will no longer override CUDA_VISIBLE_DEVICES when num_gpus = 0 or None. --------- Signed-off-by: Joshua Lee <joshlee@anyscale.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Signed-off-by: doanxem99 <nguyendinhphuongnam99@gmail.com>

… scrubbing on num_gpus=0 actors #62492 flipped the default of RAY_ACCEL_ENV_VAR_OVERRIDE_ON_ZERO from True to False, so Ray no longer overrides CUDA_VISIBLE_DEVICES for actors with num_gpus=0. 11 test cases in test_torch_tensor_transport.py relied on the old behavior where bare Actor.remote() workers would have CUDA_VISIBLE_DEVICES="" set, causing torch to raise "No CUDA GPUs are available" on .to("cuda"). Adds a per-test fixture that sets RAY_ACCEL_ENV_VAR_OVERRIDE_ON_ZERO=1 via monkeypatch before ray_start_regular boots Ray, restoring the old behavior for just the affected tests. No production code is changed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… scrubbing on num_gpus=0 actors #62492 flipped the default of RAY_ACCEL_ENV_VAR_OVERRIDE_ON_ZERO from True to False, so Ray no longer overrides CUDA_VISIBLE_DEVICES for actors with num_gpus=0. 11 test cases in test_torch_tensor_transport.py relied on the old behavior where bare Actor.remote() workers would have CUDA_VISIBLE_DEVICES="" set, causing torch to raise "No CUDA GPUs are available" on .to("cuda"). Adds a per-test fixture that sets RAY_ACCEL_ENV_VAR_OVERRIDE_ON_ZERO=1 via monkeypatch before ray_start_regular boots Ray, restoring the old behavior for just the affected tests. No production code is changed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>

After #62492 we no longer set CUDA_VISIBLE_DEVIES ="" when num_gpus=0 or not set. Torch if it detects that CUDA_VISIBLE_DEVIES ="" throws a runtime error, however now that CUDA_VISIBLE_DEVIES is not set at all it falls back to the nvidia driver to get the device ids. Following up on #62653 and instead checking for the default cuda:0 gpu id in these tests. --------- Signed-off-by: Joshua Lee <joshlee@anyscale.com>

ray-project#62492) Following up on ray-project#54928 where we originally introduced a feature flag to give users the option to not set CUDA_VISIBLE_DEVICES when num_gpus=0 or None. We also output an error informing users that the default behavior will be changed in a future ray version. Since it's been around 8 months since we introduced this feature flag and the error is a bit distracting, we're now setting this as the default behavior meaning we will no longer override CUDA_VISIBLE_DEVICES when num_gpus = 0 or None. --------- Signed-off-by: Joshua Lee <joshlee@anyscale.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>

After ray-project#62492 we no longer set CUDA_VISIBLE_DEVIES ="" when num_gpus=0 or not set. Torch if it detects that CUDA_VISIBLE_DEVIES ="" throws a runtime error, however now that CUDA_VISIBLE_DEVIES is not set at all it falls back to the nvidia driver to get the device ids. Following up on ray-project#62653 and instead checking for the default cuda:0 gpu id in these tests. --------- Signed-off-by: Joshua Lee <joshlee@anyscale.com>

Refine

a793074

Signed-off-by: Joshua Lee <joshlee@anyscale.com>

Sparks0219 assigned edoakes Apr 9, 2026

Sparks0219 added the go add ONLY when ready to merge, run all tests label Apr 9, 2026

Sparks0219 requested a review from a team as a code owner April 9, 2026 23:49

Sparks0219 changed the title ~~[core] Remove override accelerator warning~~ [core] Remove override accelerator warning and change default behavior Apr 9, 2026

gemini-code-assist Bot reviewed Apr 9, 2026

View reviewed changes

ray-gardener Bot added the core Issues that should be addressed in Ray Core label Apr 10, 2026

edoakes approved these changes Apr 10, 2026

View reviewed changes

Sparks0219 added 4 commits April 10, 2026 19:00

Merge remote-tracking branch 'upstream/master' into joshlee/remove-ac…

4d8e50a

…celerator-override-warning-and-switch-default-behavior

Merge remote-tracking branch 'upstream/master' into joshlee/remove-ac…

e3e2626

…celerator-override-warning-and-switch-default-behavior

Fix accelerator test failures

de6ca3d

Signed-off-by: Joshua Lee <joshlee@anyscale.com>

Fix more accelerator test failures

22627e0

Signed-off-by: Joshua Lee <joshlee@anyscale.com>

Sparks0219 requested a review from a team as a code owner April 11, 2026 22:39

Merge remote-tracking branch 'upstream/master' into joshlee/remove-ac…

1a02bb3

…celerator-override-warning-and-switch-default-behavior

edoakes and others added 5 commits April 12, 2026 16:17

Merge branch 'master' into joshlee/remove-accelerator-override-warnin…

31062b5

…g-and-switch-default-behavior

Merge remote-tracking branch 'upstream/master' into joshlee/remove-ac…

d9cfb0c

…celerator-override-warning-and-switch-default-behavior

Merge remote-tracking branch 'upstream/master' into joshlee/remove-ac…

45280f8

…celerator-override-warning-and-switch-default-behavior

Another test failure

c8a3a96

Signed-off-by: Joshua Lee <joshlee@anyscale.com>

Fix more test failures

4b7a979

Signed-off-by: Joshua Lee <joshlee@anyscale.com>

cursor Bot reviewed Apr 13, 2026

View reviewed changes

Comment thread python/ray/train/v2/torch/train_loop_utils.py

Refine

bf3a846

Signed-off-by: Joshua Lee <joshlee@anyscale.com>

edoakes approved these changes Apr 13, 2026

View reviewed changes

edoakes merged commit 31c8ae1 into ray-project:master Apr 13, 2026
6 checks passed

elliot-barn mentioned this pull request Apr 16, 2026

[core] Fix test_torch_tensor_transport expecting CUDA_VISIBLE_DEVICES… #62653

Closed

Sparks0219 mentioned this pull request Apr 17, 2026

[core] Deflake torch tensor transport test #62743

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[core] Remove override accelerator warning and change default behavior#62492

[core] Remove override accelerator warning and change default behavior#62492
edoakes merged 12 commits intoray-project:masterfrom
Sparks0219:joshlee/remove-accelerator-override-warning-and-switch-default-behavior

Sparks0219 commented Apr 9, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 9, 2026

Uh oh!

edoakes commented Apr 10, 2026

Uh oh!

edoakes commented Apr 12, 2026

Uh oh!

Sparks0219 commented Apr 12, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Sparks0219 commented Apr 9, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

edoakes commented Apr 10, 2026

Uh oh!

edoakes commented Apr 12, 2026

Uh oh!

Sparks0219 commented Apr 12, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants