Skip to content

[core] Remove override accelerator warning and change default behavior#62492

Merged
edoakes merged 12 commits intoray-project:masterfrom
Sparks0219:joshlee/remove-accelerator-override-warning-and-switch-default-behavior
Apr 13, 2026
Merged

[core] Remove override accelerator warning and change default behavior#62492
edoakes merged 12 commits intoray-project:masterfrom
Sparks0219:joshlee/remove-accelerator-override-warning-and-switch-default-behavior

Conversation

@Sparks0219
Copy link
Copy Markdown
Contributor

Following up on #54928 where we originally introduced a feature flag to give users the option to not set CUDA_VISIBLE_DEVICES when num_gpus=0 or None. We also output an error informing users that the default behavior will be changed in a future ray version. Since it's been around 8 months since we introduced this feature flag and the error is a bit distracting, we're now setting this as the default behavior meaning we will no longer override CUDA_VISIBLE_DEVICES when num_gpus = 0 or None.

Signed-off-by: Joshua Lee <joshlee@anyscale.com>
@Sparks0219 Sparks0219 added the go add ONLY when ready to merge, run all tests label Apr 9, 2026
@Sparks0219 Sparks0219 requested a review from a team as a code owner April 9, 2026 23:49
@Sparks0219 Sparks0219 changed the title [core] Remove override accelerator warning [core] Remove override accelerator warning and change default behavior Apr 9, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request modifies Ray's behavior to prevent the overriding of accelerator environment variables, such as CUDA_VISIBLE_DEVICES, when zero accelerators are allocated. Key changes include setting the default value of RAY_ACCEL_ENV_VAR_OVERRIDE_ON_ZERO to False, removing the corresponding FutureWarning, and updating test cases to reflect this new default behavior. A review comment suggests improving the robustness of the tests by explicitly setting and asserting the preservation of environment variables to ensure they are not being cleared or modified during initialization.

**{"RAY_ACCEL_ENV_VAR_OVERRIDE_ON_ZERO": "0"},
),
)
run_string_as_driver(not_override_check_script)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The test for the new default behavior in not_override_check_script could be more robust. It currently asserts that CUDA_VISIBLE_DEVICES is not set, which relies on the assumption that it's not set in the test execution environment.

A stronger test would be to explicitly set CUDA_VISIBLE_DEVICES to a specific value before ray.init() and then assert that this value is preserved within the remote task/actor. This would more accurately verify that the environment variable is not being overridden when num_gpus=0.

Here's a suggested improvement for not_override_check_script:

not_override_check_script = """
import os
import ray

# Set a specific value to check for preservation
os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2"
ray.init()

@ray.remote(num_gpus=0)
def check():
    import os
    assert os.environ.get("CUDA_VISIBLE_DEVICES") == "0,1,2"

@ray.remote(num_gpus=0)
class Actor:
    def check(self):
        import os
        assert os.environ.get("CUDA_VISIBLE_DEVICES") == "0,1,2"

print("task check", ray.get(check.remote()))
print("actor check", ray.get(Actor.options(num_gpus=0).remote().check.remote()))
"""

This change would make the test more explicit and less dependent on the environment configuration.

@ray-gardener ray-gardener Bot added the core Issues that should be addressed in Ray Core label Apr 10, 2026
@edoakes
Copy link
Copy Markdown
Collaborator

edoakes commented Apr 10, 2026

@Sparks0219 some relevant test failures

…celerator-override-warning-and-switch-default-behavior
…celerator-override-warning-and-switch-default-behavior
Signed-off-by: Joshua Lee <joshlee@anyscale.com>
Signed-off-by: Joshua Lee <joshlee@anyscale.com>
@Sparks0219 Sparks0219 requested a review from a team as a code owner April 11, 2026 22:39
…celerator-override-warning-and-switch-default-behavior
@edoakes
Copy link
Copy Markdown
Collaborator

edoakes commented Apr 12, 2026

many failing tests

@Sparks0219
Copy link
Copy Markdown
Contributor Author

many failing tests

the remaining ones are due to some java_plugin thing and not related, I think premerge is broken right now 😪

edoakes and others added 5 commits April 12, 2026 16:17
…celerator-override-warning-and-switch-default-behavior
…celerator-override-warning-and-switch-default-behavior
Signed-off-by: Joshua Lee <joshlee@anyscale.com>
Signed-off-by: Joshua Lee <joshlee@anyscale.com>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Reviewed by Cursor Bugbot for commit 4b7a979. Configure here.

Comment thread python/ray/train/v2/torch/train_loop_utils.py
Signed-off-by: Joshua Lee <joshlee@anyscale.com>
@edoakes edoakes merged commit 31c8ae1 into ray-project:master Apr 13, 2026
6 checks passed
doanxem99 pushed a commit to doanxem99/ray that referenced this pull request Apr 15, 2026
ray-project#62492)

Following up on ray-project#54928 where we originally introduced a feature flag to
give users the option to not set CUDA_VISIBLE_DEVICES when num_gpus=0 or
None. We also output an error informing users that the default behavior
will be changed in a future ray version. Since it's been around 8 months
since we introduced this feature flag and the error is a bit
distracting, we're now setting this as the default behavior meaning we
will no longer override CUDA_VISIBLE_DEVICES when num_gpus = 0 or None.

---------

Signed-off-by: Joshua Lee <joshlee@anyscale.com>
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Signed-off-by: doanxem99 <nguyendinhphuongnam99@gmail.com>
elliot-barn added a commit that referenced this pull request Apr 16, 2026
… scrubbing on num_gpus=0 actors

#62492 flipped the default of RAY_ACCEL_ENV_VAR_OVERRIDE_ON_ZERO from True to
False, so Ray no longer overrides CUDA_VISIBLE_DEVICES for actors with
num_gpus=0. 11 test cases in test_torch_tensor_transport.py relied on the old
behavior where bare Actor.remote() workers would have CUDA_VISIBLE_DEVICES=""
set, causing torch to raise "No CUDA GPUs are available" on .to("cuda").

Adds a per-test fixture that sets RAY_ACCEL_ENV_VAR_OVERRIDE_ON_ZERO=1 via
monkeypatch before ray_start_regular boots Ray, restoring the old behavior for
just the affected tests. No production code is changed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
elliot-barn added a commit that referenced this pull request Apr 16, 2026
… scrubbing on num_gpus=0 actors

#62492 flipped the default of RAY_ACCEL_ENV_VAR_OVERRIDE_ON_ZERO from True to
False, so Ray no longer overrides CUDA_VISIBLE_DEVICES for actors with
num_gpus=0. 11 test cases in test_torch_tensor_transport.py relied on the old
behavior where bare Actor.remote() workers would have CUDA_VISIBLE_DEVICES=""
set, causing torch to raise "No CUDA GPUs are available" on .to("cuda").

Adds a per-test fixture that sets RAY_ACCEL_ENV_VAR_OVERRIDE_ON_ZERO=1 via
monkeypatch before ray_start_regular boots Ray, restoring the old behavior for
just the affected tests. No production code is changed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
richardliaw pushed a commit that referenced this pull request Apr 18, 2026
After #62492 we no longer set CUDA_VISIBLE_DEVIES ="" when num_gpus=0 or
not set. Torch if it detects that CUDA_VISIBLE_DEVIES ="" throws a
runtime error, however now that CUDA_VISIBLE_DEVIES is not set at all it
falls back to the nvidia driver to get the device ids. Following up on
#62653 and instead checking for the default cuda:0 gpu id in these
tests.

---------

Signed-off-by: Joshua Lee <joshlee@anyscale.com>
HLDKNotFound pushed a commit to chichic21039/ray that referenced this pull request Apr 22, 2026
ray-project#62492)

Following up on ray-project#54928 where we originally introduced a feature flag to
give users the option to not set CUDA_VISIBLE_DEVICES when num_gpus=0 or
None. We also output an error informing users that the default behavior
will be changed in a future ray version. Since it's been around 8 months
since we introduced this feature flag and the error is a bit
distracting, we're now setting this as the default behavior meaning we
will no longer override CUDA_VISIBLE_DEVICES when num_gpus = 0 or None.

---------

Signed-off-by: Joshua Lee <joshlee@anyscale.com>
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
HLDKNotFound pushed a commit to chichic21039/ray that referenced this pull request Apr 22, 2026
After ray-project#62492 we no longer set CUDA_VISIBLE_DEVIES ="" when num_gpus=0 or
not set. Torch if it detects that CUDA_VISIBLE_DEVIES ="" throws a
runtime error, however now that CUDA_VISIBLE_DEVIES is not set at all it
falls back to the nvidia driver to get the device ids. Following up on
ray-project#62653 and instead checking for the default cuda:0 gpu id in these
tests.

---------

Signed-off-by: Joshua Lee <joshlee@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Issues that should be addressed in Ray Core go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants