[2/3] queue-based autoscaling - add default queue-based autoscaling policy by harshit-anyscale · Pull Request #59548 · ray-project/ray

harshit-anyscale · 2025-12-18T14:16:37Z

Summary

This PR adds queue-based autoscaling support for async inference workloads in Ray Serve. It enables deployments to scale based on combined workload from both the message broker queue and HTTP requests.

Related PRs:

PR 1 (Prerequisite): #59430 - Broker and QueueMonitor foundation
PR 3 (Follow-up): Integration with TaskConsumer

Changes

New Autoscaling Policy

Component	Description
`async_inference_autoscaling_policy()`	Scales replicas based on combined workload: `queue_length + total_num_requests`
`default_async_inference_autoscaling_policy`	Export alias for the new policy

QueueMonitor Enhancements

The QueueMonitorActor now pushes queue metrics to the controller for autoscaling:

Accepts deployment_id and controller_handle parameters
Uses MetricsPusher to periodically push queue length to the controller
start_metrics_pusher() - deferred initialization (event loop not available in __init__)
Lazy initialization in get_queue_length() handles actor restarts
Synchronous __ray_shutdown__ (Ray calls it without awaiting)

Controller Integration

New record_autoscaling_metrics_from_async_inference_task_queue() method
New gauge: serve_autoscaling_async_inference_task_queue_metrics_delay_ms

New Types

AsyncInferenceTaskQueueMetricReport - dataclass for queue metrics from QueueMonitor to controller
AutoscalingContext.async_inference_task_queue_length - new property for queue length

Scaling Formula

total_workload = queue_length + total_num_requests
desired_replicas = total_workload / target_ongoing_requests

Example:

Queue: 100 pending tasks
HTTP: 50 ongoing requests
target_ongoing_requests: 10
Desired replicas = (100 + 50) / 10 = 15

🤖 Generated with Claude Code

gemini-code-assist

Code Review

This pull request introduces a new queue-based autoscaling policy, which is a great addition for TaskConsumer deployments. The implementation is well-structured, with a dedicated QueueMonitor actor and comprehensive unit tests. I've identified a critical bug in the Redis connection handling and a high-severity logic issue in the scaling-to-zero implementation. Addressing these will ensure the new feature is robust and behaves as expected.

python/ray/serve/_private/queue_monitor.py

python/ray/serve/autoscaling_policy.py

github-actions · 2026-01-02T12:26:00Z

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

Signed-off-by: harshit <harshit@anyscale.com>

abrarsheikh

i would base this PR on top of #58857

python/ray/serve/_private/constants.py

python/ray/serve/autoscaling_policy.py

python/ray/serve/tests/unit/test_queue_autoscaling_policy.py

harshit-anyscale · 2026-01-16T14:39:15Z

i would base this PR on top of #58857

i am not sure of the timeline we are targeting for #58857 PR, but since we want to get queue-based autoscaling feature out asap, hence, thought of merging this PR as it is.

and then once the #58857 PR is merged, i will create a new one, using the changes of #58857 to refactor the queue-aware autoscaling policy.

@abrarsheikh lmk your thoughts on it.

Signed-off-by: harshit <harshit@anyscale.com>

python/ray/serve/_private/autoscaling_state.py

python/ray/serve/_private/queue_monitor.py

Signed-off-by: harshit <harshit@anyscale.com>

python/ray/serve/autoscaling_policy.py

abrarsheikh

can you add end to end tests for autoscaling, or is that not possible in this PR?

python/ray/serve/config.py

python/ray/serve/_private/autoscaling_state.py

python/ray/serve/_private/controller.py

python/ray/serve/_private/queue_monitor.py

python/ray/serve/autoscaling_policy.py

python/ray/serve/_private/queue_monitor.py

python/ray/serve/_private/constants.py

harshit-anyscale · 2026-02-03T11:19:40Z

can you add end to end tests for autoscaling, or is that not possible in this PR?

that won't be possible in this PR as the integration of this queue based autoscaling policy with serve deployments is still pending, will add that in the follow-up PR.

Signed-off-by: harshit <harshit@anyscale.com>

python/ray/serve/autoscaling_policy.py

Signed-off-by: harshit <harshit@anyscale.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

python/ray/serve/autoscaling_policy.py

doc/source/serve/monitoring.md

Signed-off-by: harshit <harshit@anyscale.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

python/ray/serve/autoscaling_policy.py

python/ray/serve/tests/test_metrics_3.py

python/ray/serve/_private/autoscaling_state.py

Signed-off-by: harshit <harshit@anyscale.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

python/ray/serve/autoscaling_policy.py

…olicy (ray-project#59548) ## Summary This PR adds queue-based autoscaling support for async inference workloads in Ray Serve. It enables deployments to scale based on combined workload from both the message broker queue and HTTP requests. **Related PRs:** - PR 1 (Prerequisite): [ray-project#59430](ray-project#59430) - Broker and QueueMonitor foundation - PR 3 (Follow-up): Integration with TaskConsumer ## Changes ### New Autoscaling Policy | Component | Description | |-----------|-------------| | `async_inference_autoscaling_policy()` | Scales replicas based on combined workload: `queue_length + total_num_requests` | | `default_async_inference_autoscaling_policy` | Export alias for the new policy | ### QueueMonitor Enhancements The `QueueMonitorActor` now pushes queue metrics to the controller for autoscaling: - Accepts `deployment_id` and `controller_handle` parameters - Uses `MetricsPusher` to periodically push queue length to the controller - `start_metrics_pusher()` - deferred initialization (event loop not available in `__init__`) - Lazy initialization in `get_queue_length()` handles actor restarts - Synchronous `__ray_shutdown__` (Ray calls it without awaiting) ### Controller Integration - New `record_autoscaling_metrics_from_async_inference_task_queue()` method - New gauge: `serve_autoscaling_async_inference_task_queue_metrics_delay_ms` ### New Types - `AsyncInferenceTaskQueueMetricReport` - dataclass for queue metrics from QueueMonitor to controller - `AutoscalingContext.async_inference_task_queue_length` - new property for queue length ## Scaling Formula ```python total_workload = queue_length + total_num_requests desired_replicas = total_workload / target_ongoing_requests ``` Example: - Queue: 100 pending tasks - HTTP: 50 ongoing requests - `target_ongoing_requests`: 10 - Desired replicas = (100 + 50) / 10 = 15 --- 🤖 Generated with [Claude Code](https://claude.ai/code) --------- Signed-off-by: harshit <harshit@anyscale.com> Signed-off-by: tiennguyentony <46289799+tiennguyentony@users.noreply.github.com>

…olicy (ray-project#59548) ## Summary This PR adds queue-based autoscaling support for async inference workloads in Ray Serve. It enables deployments to scale based on combined workload from both the message broker queue and HTTP requests. **Related PRs:** - PR 1 (Prerequisite): [ray-project#59430](ray-project#59430) - Broker and QueueMonitor foundation - PR 3 (Follow-up): Integration with TaskConsumer ## Changes ### New Autoscaling Policy | Component | Description | |-----------|-------------| | `async_inference_autoscaling_policy()` | Scales replicas based on combined workload: `queue_length + total_num_requests` | | `default_async_inference_autoscaling_policy` | Export alias for the new policy | ### QueueMonitor Enhancements The `QueueMonitorActor` now pushes queue metrics to the controller for autoscaling: - Accepts `deployment_id` and `controller_handle` parameters - Uses `MetricsPusher` to periodically push queue length to the controller - `start_metrics_pusher()` - deferred initialization (event loop not available in `__init__`) - Lazy initialization in `get_queue_length()` handles actor restarts - Synchronous `__ray_shutdown__` (Ray calls it without awaiting) ### Controller Integration - New `record_autoscaling_metrics_from_async_inference_task_queue()` method - New gauge: `serve_autoscaling_async_inference_task_queue_metrics_delay_ms` ### New Types - `AsyncInferenceTaskQueueMetricReport` - dataclass for queue metrics from QueueMonitor to controller - `AutoscalingContext.async_inference_task_queue_length` - new property for queue length ## Scaling Formula ```python total_workload = queue_length + total_num_requests desired_replicas = total_workload / target_ongoing_requests ``` Example: - Queue: 100 pending tasks - HTTP: 50 ongoing requests - `target_ongoing_requests`: 10 - Desired replicas = (100 + 50) / 10 = 15 --- 🤖 Generated with [Claude Code](https://claude.ai/code) --------- Signed-off-by: harshit <harshit@anyscale.com>

…olicy (#59548) ## Summary This PR adds queue-based autoscaling support for async inference workloads in Ray Serve. It enables deployments to scale based on combined workload from both the message broker queue and HTTP requests. **Related PRs:** - PR 1 (Prerequisite): [#59430](#59430) - Broker and QueueMonitor foundation - PR 3 (Follow-up): Integration with TaskConsumer ## Changes ### New Autoscaling Policy | Component | Description | |-----------|-------------| | `async_inference_autoscaling_policy()` | Scales replicas based on combined workload: `queue_length + total_num_requests` | | `default_async_inference_autoscaling_policy` | Export alias for the new policy | ### QueueMonitor Enhancements The `QueueMonitorActor` now pushes queue metrics to the controller for autoscaling: - Accepts `deployment_id` and `controller_handle` parameters - Uses `MetricsPusher` to periodically push queue length to the controller - `start_metrics_pusher()` - deferred initialization (event loop not available in `__init__`) - Lazy initialization in `get_queue_length()` handles actor restarts - Synchronous `__ray_shutdown__` (Ray calls it without awaiting) ### Controller Integration - New `record_autoscaling_metrics_from_async_inference_task_queue()` method - New gauge: `serve_autoscaling_async_inference_task_queue_metrics_delay_ms` ### New Types - `AsyncInferenceTaskQueueMetricReport` - dataclass for queue metrics from QueueMonitor to controller - `AutoscalingContext.async_inference_task_queue_length` - new property for queue length ## Scaling Formula ```python total_workload = queue_length + total_num_requests desired_replicas = total_workload / target_ongoing_requests ``` Example: - Queue: 100 pending tasks - HTTP: 50 ongoing requests - `target_ongoing_requests`: 10 - Desired replicas = (100 + 50) / 10 = 15 --- 🤖 Generated with [Claude Code](https://claude.ai/code) --------- Signed-off-by: harshit <harshit@anyscale.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>

…olicy (#59548) ## Summary This PR adds queue-based autoscaling support for async inference workloads in Ray Serve. It enables deployments to scale based on combined workload from both the message broker queue and HTTP requests. **Related PRs:** - PR 1 (Prerequisite): [#59430](#59430) - Broker and QueueMonitor foundation - PR 3 (Follow-up): Integration with TaskConsumer ## Changes ### New Autoscaling Policy | Component | Description | |-----------|-------------| | `async_inference_autoscaling_policy()` | Scales replicas based on combined workload: `queue_length + total_num_requests` | | `default_async_inference_autoscaling_policy` | Export alias for the new policy | ### QueueMonitor Enhancements The `QueueMonitorActor` now pushes queue metrics to the controller for autoscaling: - Accepts `deployment_id` and `controller_handle` parameters - Uses `MetricsPusher` to periodically push queue length to the controller - `start_metrics_pusher()` - deferred initialization (event loop not available in `__init__`) - Lazy initialization in `get_queue_length()` handles actor restarts - Synchronous `__ray_shutdown__` (Ray calls it without awaiting) ### Controller Integration - New `record_autoscaling_metrics_from_async_inference_task_queue()` method - New gauge: `serve_autoscaling_async_inference_task_queue_metrics_delay_ms` ### New Types - `AsyncInferenceTaskQueueMetricReport` - dataclass for queue metrics from QueueMonitor to controller - `AutoscalingContext.async_inference_task_queue_length` - new property for queue length ## Scaling Formula ```python total_workload = queue_length + total_num_requests desired_replicas = total_workload / target_ongoing_requests ``` Example: - Queue: 100 pending tasks - HTTP: 50 ongoing requests - `target_ongoing_requests`: 10 - Desired replicas = (100 + 50) / 10 = 15 --- 🤖 Generated with [Claude Code](https://claude.ai/code) --------- Signed-off-by: harshit <harshit@anyscale.com>

…olicy (ray-project#59548) ## Summary This PR adds queue-based autoscaling support for async inference workloads in Ray Serve. It enables deployments to scale based on combined workload from both the message broker queue and HTTP requests. **Related PRs:** - PR 1 (Prerequisite): [ray-project#59430](ray-project#59430) - Broker and QueueMonitor foundation - PR 3 (Follow-up): Integration with TaskConsumer ## Changes ### New Autoscaling Policy | Component | Description | |-----------|-------------| | `async_inference_autoscaling_policy()` | Scales replicas based on combined workload: `queue_length + total_num_requests` | | `default_async_inference_autoscaling_policy` | Export alias for the new policy | ### QueueMonitor Enhancements The `QueueMonitorActor` now pushes queue metrics to the controller for autoscaling: - Accepts `deployment_id` and `controller_handle` parameters - Uses `MetricsPusher` to periodically push queue length to the controller - `start_metrics_pusher()` - deferred initialization (event loop not available in `__init__`) - Lazy initialization in `get_queue_length()` handles actor restarts - Synchronous `__ray_shutdown__` (Ray calls it without awaiting) ### Controller Integration - New `record_autoscaling_metrics_from_async_inference_task_queue()` method - New gauge: `serve_autoscaling_async_inference_task_queue_metrics_delay_ms` ### New Types - `AsyncInferenceTaskQueueMetricReport` - dataclass for queue metrics from QueueMonitor to controller - `AutoscalingContext.async_inference_task_queue_length` - new property for queue length ## Scaling Formula ```python total_workload = queue_length + total_num_requests desired_replicas = total_workload / target_ongoing_requests ``` Example: - Queue: 100 pending tasks - HTTP: 50 ongoing requests - `target_ongoing_requests`: 10 - Desired replicas = (100 + 50) / 10 = 15 --- 🤖 Generated with [Claude Code](https://claude.ai/code) --------- Signed-off-by: harshit <harshit@anyscale.com> Signed-off-by: Muhammad Saif <2024BBIT200@student.Uet.edu.pk>

…olicy (ray-project#59548) ## Summary This PR adds queue-based autoscaling support for async inference workloads in Ray Serve. It enables deployments to scale based on combined workload from both the message broker queue and HTTP requests. **Related PRs:** - PR 1 (Prerequisite): [ray-project#59430](ray-project#59430) - Broker and QueueMonitor foundation - PR 3 (Follow-up): Integration with TaskConsumer ## Changes ### New Autoscaling Policy | Component | Description | |-----------|-------------| | `async_inference_autoscaling_policy()` | Scales replicas based on combined workload: `queue_length + total_num_requests` | | `default_async_inference_autoscaling_policy` | Export alias for the new policy | ### QueueMonitor Enhancements The `QueueMonitorActor` now pushes queue metrics to the controller for autoscaling: - Accepts `deployment_id` and `controller_handle` parameters - Uses `MetricsPusher` to periodically push queue length to the controller - `start_metrics_pusher()` - deferred initialization (event loop not available in `__init__`) - Lazy initialization in `get_queue_length()` handles actor restarts - Synchronous `__ray_shutdown__` (Ray calls it without awaiting) ### Controller Integration - New `record_autoscaling_metrics_from_async_inference_task_queue()` method - New gauge: `serve_autoscaling_async_inference_task_queue_metrics_delay_ms` ### New Types - `AsyncInferenceTaskQueueMetricReport` - dataclass for queue metrics from QueueMonitor to controller - `AutoscalingContext.async_inference_task_queue_length` - new property for queue length ## Scaling Formula ```python total_workload = queue_length + total_num_requests desired_replicas = total_workload / target_ongoing_requests ``` Example: - Queue: 100 pending tasks - HTTP: 50 ongoing requests - `target_ongoing_requests`: 10 - Desired replicas = (100 + 50) / 10 = 15 --- 🤖 Generated with [Claude Code](https://claude.ai/code) --------- Signed-off-by: harshit <harshit@anyscale.com>

…olicy (ray-project#59548) ## Summary This PR adds queue-based autoscaling support for async inference workloads in Ray Serve. It enables deployments to scale based on combined workload from both the message broker queue and HTTP requests. **Related PRs:** - PR 1 (Prerequisite): [ray-project#59430](ray-project#59430) - Broker and QueueMonitor foundation - PR 3 (Follow-up): Integration with TaskConsumer ## Changes ### New Autoscaling Policy | Component | Description | |-----------|-------------| | `async_inference_autoscaling_policy()` | Scales replicas based on combined workload: `queue_length + total_num_requests` | | `default_async_inference_autoscaling_policy` | Export alias for the new policy | ### QueueMonitor Enhancements The `QueueMonitorActor` now pushes queue metrics to the controller for autoscaling: - Accepts `deployment_id` and `controller_handle` parameters - Uses `MetricsPusher` to periodically push queue length to the controller - `start_metrics_pusher()` - deferred initialization (event loop not available in `__init__`) - Lazy initialization in `get_queue_length()` handles actor restarts - Synchronous `__ray_shutdown__` (Ray calls it without awaiting) ### Controller Integration - New `record_autoscaling_metrics_from_async_inference_task_queue()` method - New gauge: `serve_autoscaling_async_inference_task_queue_metrics_delay_ms` ### New Types - `AsyncInferenceTaskQueueMetricReport` - dataclass for queue metrics from QueueMonitor to controller - `AutoscalingContext.async_inference_task_queue_length` - new property for queue length ## Scaling Formula ```python total_workload = queue_length + total_num_requests desired_replicas = total_workload / target_ongoing_requests ``` Example: - Queue: 100 pending tasks - HTTP: 50 ongoing requests - `target_ongoing_requests`: 10 - Desired replicas = (100 + 50) / 10 = 15 --- 🤖 Generated with [Claude Code](https://claude.ai/code) --------- Signed-off-by: harshit <harshit@anyscale.com> Signed-off-by: Adel Nour <ans9868@nyu.edu>

…olicy (ray-project#59548) ## Summary This PR adds queue-based autoscaling support for async inference workloads in Ray Serve. It enables deployments to scale based on combined workload from both the message broker queue and HTTP requests. **Related PRs:** - PR 1 (Prerequisite): [ray-project#59430](ray-project#59430) - Broker and QueueMonitor foundation - PR 3 (Follow-up): Integration with TaskConsumer ## Changes ### New Autoscaling Policy | Component | Description | |-----------|-------------| | `async_inference_autoscaling_policy()` | Scales replicas based on combined workload: `queue_length + total_num_requests` | | `default_async_inference_autoscaling_policy` | Export alias for the new policy | ### QueueMonitor Enhancements The `QueueMonitorActor` now pushes queue metrics to the controller for autoscaling: - Accepts `deployment_id` and `controller_handle` parameters - Uses `MetricsPusher` to periodically push queue length to the controller - `start_metrics_pusher()` - deferred initialization (event loop not available in `__init__`) - Lazy initialization in `get_queue_length()` handles actor restarts - Synchronous `__ray_shutdown__` (Ray calls it without awaiting) ### Controller Integration - New `record_autoscaling_metrics_from_async_inference_task_queue()` method - New gauge: `serve_autoscaling_async_inference_task_queue_metrics_delay_ms` ### New Types - `AsyncInferenceTaskQueueMetricReport` - dataclass for queue metrics from QueueMonitor to controller - `AutoscalingContext.async_inference_task_queue_length` - new property for queue length ## Scaling Formula ```python total_workload = queue_length + total_num_requests desired_replicas = total_workload / target_ongoing_requests ``` Example: - Queue: 100 pending tasks - HTTP: 50 ongoing requests - `target_ongoing_requests`: 10 - Desired replicas = (100 + 50) / 10 = 15 --- 🤖 Generated with [Claude Code](https://claude.ai/code) --------- Signed-off-by: harshit <harshit@anyscale.com>

…olicy (ray-project#59548) ## Summary This PR adds queue-based autoscaling support for async inference workloads in Ray Serve. It enables deployments to scale based on combined workload from both the message broker queue and HTTP requests. **Related PRs:** - PR 1 (Prerequisite): [ray-project#59430](ray-project#59430) - Broker and QueueMonitor foundation - PR 3 (Follow-up): Integration with TaskConsumer ## Changes ### New Autoscaling Policy | Component | Description | |-----------|-------------| | `async_inference_autoscaling_policy()` | Scales replicas based on combined workload: `queue_length + total_num_requests` | | `default_async_inference_autoscaling_policy` | Export alias for the new policy | ### QueueMonitor Enhancements The `QueueMonitorActor` now pushes queue metrics to the controller for autoscaling: - Accepts `deployment_id` and `controller_handle` parameters - Uses `MetricsPusher` to periodically push queue length to the controller - `start_metrics_pusher()` - deferred initialization (event loop not available in `__init__`) - Lazy initialization in `get_queue_length()` handles actor restarts - Synchronous `__ray_shutdown__` (Ray calls it without awaiting) ### Controller Integration - New `record_autoscaling_metrics_from_async_inference_task_queue()` method - New gauge: `serve_autoscaling_async_inference_task_queue_metrics_delay_ms` ### New Types - `AsyncInferenceTaskQueueMetricReport` - dataclass for queue metrics from QueueMonitor to controller - `AutoscalingContext.async_inference_task_queue_length` - new property for queue length ## Scaling Formula ```python total_workload = queue_length + total_num_requests desired_replicas = total_workload / target_ongoing_requests ``` Example: - Queue: 100 pending tasks - HTTP: 50 ongoing requests - `target_ongoing_requests`: 10 - Desired replicas = (100 + 50) / 10 = 15 --- 🤖 Generated with [Claude Code](https://claude.ai/code) --------- Signed-off-by: harshit <harshit@anyscale.com> Signed-off-by: peterxcli <peterxcli@gmail.com>

harshit-anyscale self-assigned this Dec 18, 2025

gemini-code-assist bot reviewed Dec 18, 2025

View reviewed changes

python/ray/serve/_private/queue_monitor.py Outdated Show resolved Hide resolved

python/ray/serve/autoscaling_policy.py Outdated Show resolved Hide resolved

harshit-anyscale added the go add ONLY when ready to merge, run all tests label Dec 19, 2025

github-actions bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jan 2, 2026

add queue based autoscaling

26d4bd7

Signed-off-by: harshit <harshit@anyscale.com>

harshit-anyscale force-pushed the queue-based-autoscaling-part-2 branch from 86223e0 to 26d4bd7 Compare January 9, 2026 07:05

merge master

b588c1f

Signed-off-by: harshit <harshit@anyscale.com>

harshit-anyscale removed the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jan 9, 2026

refactoring

0c7cf30

Signed-off-by: harshit <harshit@anyscale.com>

harshit-anyscale force-pushed the queue-based-autoscaling-part-2 branch from 29f8b0b to 0c7cf30 Compare January 12, 2026 18:08

harshit-anyscale marked this pull request as ready for review January 12, 2026 19:07

harshit-anyscale requested a review from a team as a code owner January 12, 2026 19:07

harshit-anyscale requested review from abrarsheikh, akyang-anyscale and landscapepainter January 12, 2026 19:09

ray-gardener bot added the serve Ray Serve Related Issue label Jan 13, 2026

abrarsheikh reviewed Jan 15, 2026

View reviewed changes

harshit-anyscale added 2 commits January 20, 2026 16:39

review changes

ba700fb

Signed-off-by: harshit <harshit@anyscale.com>

Merge branch 'master' into queue-based-autoscaling-part-2

9e78495

This comment was marked as outdated.

Sign in to view

push metrics to controller

4a037bf

Signed-off-by: harshit <harshit@anyscale.com>

This comment was marked as outdated.

Sign in to view

push metrics to controller

8f91ed4

Signed-off-by: harshit <harshit@anyscale.com>

cursor bot reviewed Jan 23, 2026

View reviewed changes

python/ray/serve/_private/autoscaling_state.py Outdated Show resolved Hide resolved

python/ray/serve/_private/queue_monitor.py Outdated Show resolved Hide resolved

harshit-anyscale added 2 commits January 23, 2026 18:44

push metrics to controller

d677eb7

Signed-off-by: harshit <harshit@anyscale.com>

merge master and resovle conflicts

9355acd

Signed-off-by: harshit <harshit@anyscale.com>

cursor bot reviewed Feb 1, 2026

View reviewed changes

python/ray/serve/autoscaling_policy.py Outdated Show resolved Hide resolved

cursor bot reviewed Feb 1, 2026

View reviewed changes

python/ray/serve/autoscaling_policy.py Outdated Show resolved Hide resolved

abrarsheikh reviewed Feb 3, 2026

View reviewed changes

Merge branch 'master' into queue-based-autoscaling-part-2

d7e6f0d

cursor bot reviewed Feb 3, 2026

View reviewed changes

python/ray/serve/_private/queue_monitor.py Outdated Show resolved Hide resolved

python/ray/serve/_private/constants.py Outdated Show resolved Hide resolved

review changes

00590c4

Signed-off-by: harshit <harshit@anyscale.com>

cursor bot reviewed Feb 3, 2026

View reviewed changes

python/ray/serve/autoscaling_policy.py Show resolved Hide resolved

review changes

d990d59

Signed-off-by: harshit <harshit@anyscale.com>

harshit-anyscale requested a review from abrarsheikh February 3, 2026 15:55

cursor bot reviewed Feb 3, 2026

View reviewed changes

python/ray/serve/autoscaling_policy.py Show resolved Hide resolved

abrarsheikh reviewed Feb 4, 2026

View reviewed changes

doc/source/serve/monitoring.md Show resolved Hide resolved

review changes

d89af82

Signed-off-by: harshit <harshit@anyscale.com>

harshit-anyscale requested a review from abrarsheikh February 5, 2026 06:17

cursor bot reviewed Feb 5, 2026

View reviewed changes

python/ray/serve/autoscaling_policy.py Show resolved Hide resolved

abrarsheikh approved these changes Feb 5, 2026

View reviewed changes

python/ray/serve/tests/test_metrics_3.py Show resolved Hide resolved

python/ray/serve/tests/test_metrics_3.py Show resolved Hide resolved

python/ray/serve/_private/autoscaling_state.py Outdated Show resolved Hide resolved

review changes

01d408c

Signed-off-by: harshit <harshit@anyscale.com>

cursor bot reviewed Feb 5, 2026

View reviewed changes

python/ray/serve/autoscaling_policy.py Show resolved Hide resolved

abrarsheikh merged commit 6a5e3de into master Feb 5, 2026
6 checks passed

abrarsheikh deleted the queue-based-autoscaling-part-2 branch February 5, 2026 21:28

harshit-anyscale mentioned this pull request Feb 9, 2026

[3/3] - integrate queue based autoscaling with task consumers #60851

Merged

Conversation

harshit-anyscale commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

New Autoscaling Policy

QueueMonitor Enhancements

Controller Integration

New Types

Scaling Formula

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Jan 2, 2026

Uh oh!

abrarsheikh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

harshit-anyscale commented Jan 16, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

abrarsheikh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

harshit-anyscale commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

harshit-anyscale commented Dec 18, 2025 •

edited

Loading

harshit-anyscale commented Feb 3, 2026 •

edited

Loading