[Data][Cherry-pick] Fix bug where `AutoscalingCoordinator` crashes if you request 0 GPUs on CPU-only cluster by bveeramani · Pull Request #59516 · ray-project/ray

bveeramani · 2025-12-17T20:52:58Z

Cherry-pick of #59514

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>

gemini-code-assist

Code Review

The pull request fixes a KeyError in AutoscalingCoordinator that occurred when requesting a resource with a value of 0 on a cluster where that resource type is not present (e.g., requesting 0 GPUs on a CPU-only cluster). The fix in _maybe_subtract_resources correctly prevents this crash by checking for the key's existence before attempting subtraction. A regression test has been added to cover this scenario. My review includes a suggestion to make this new test more robust by explicitly mocking the cluster state to ensure it runs deterministically, regardless of the test environment's hardware.

gemini-code-assist · 2025-12-17T20:54:51Z

python/ray/data/tests/test_autoscaling_coordinator.py

+def test_coordinator_accepts_zero_resource_for_missing_resource_type(
+    teardown_autoscaling_coordinator,
+):
+    # This is a regression test for a bug where the coordinator crashes when you request
+    # a resource type (e.g., GPU: 0) that doesn't exist on the cluster.
+    coordinator = DefaultAutoscalingCoordinator()
+
+    coordinator.request_resources(
+        requester_id="spam", resources=[{"CPU": 1, "GPU": 0}], expire_after_s=1
+    )
+
+    assert coordinator.get_allocated_resources("spam") == [{"CPU": 1, "GPU": 0}]


This new regression test is a great addition. To make it more robust and not dependent on the hardware of the execution environment, I suggest mocking ray.nodes() to simulate a cluster without GPUs. This ensures the test deterministically covers the intended scenario where a requested resource type is absent from the cluster.

Suggested change

def test_coordinator_accepts_zero_resource_for_missing_resource_type(

teardown_autoscaling_coordinator,

):

# This is a regression test for a bug where the coordinator crashes when you request

# a resource type (e.g., GPU: 0) that doesn't exist on the cluster.

coordinator = DefaultAutoscalingCoordinator()

coordinator.request_resources(

requester_id="spam", resources=[{"CPU": 1, "GPU": 0}], expire_after_s=1

)

assert coordinator.get_allocated_resources("spam") == [{"CPU": 1, "GPU": 0}]

def test_coordinator_accepts_zero_resource_for_missing_resource_type(

teardown_autoscaling_coordinator,

):

# This is a regression test for a bug where the coordinator crashes when you request

# a resource type (e.g., GPU: 0) that doesn't exist on the cluster.

cluster_with_no_gpus = [

{

"Resources": {"CPU": 2},

"Alive": True,

}

]

with patch("ray.nodes", return_value=cluster_with_no_gpus):

coordinator = DefaultAutoscalingCoordinator()

coordinator.request_resources(

requester_id="spam", resources=[{"CPU": 1, "GPU": 0}], expire_after_s=1

)

assert coordinator.get_allocated_resources("spam") == [{"CPU": 1, "GPU": 0}]

… you request 0 GPUs on CPU-only cluster (#59516) Cherry-pick of #59514 Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu> Signed-off-by: Srinath Krishnamachari <srinath.krishnamachari@anyscale.com>

… you request 0 GPUs on CPU-only cluster (ray-project#59516) Cherry-pick of ray-project#59514 Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>

Initial commit

21b5ce2

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>

bveeramani requested a review from a team as a code owner December 17, 2025 20:52

gemini-code-assist bot reviewed Dec 17, 2025

View reviewed changes

aslonnie merged commit 1736619 into releases/2.53.0 Dec 17, 2025
4 of 5 checks passed

aslonnie deleted the cp-59514 branch December 17, 2025 21:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Data][Cherry-pick] Fix bug where `AutoscalingCoordinator` crashes if you request 0 GPUs on CPU-only cluster #59516

[Data][Cherry-pick] Fix bug where `AutoscalingCoordinator` crashes if you request 0 GPUs on CPU-only cluster #59516
aslonnie merged 1 commit intoreleases/2.53.0from
cp-59514

bveeramani commented Dec 17, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bveeramani commented Dec 17, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants