[1/3] queue-based autoscaling - add queue monitor by harshit-anyscale · Pull Request #59430 · ray-project/ray

harshit-anyscale · 2025-12-15T05:33:26Z

Summary

This PR is part 1 of 3 for adding queue-based autoscaling support for Ray Serve TaskConsumer deployments.

Background

TaskConsumers are workloads that consume tasks from message queues (Redis, RabbitMQ), and their scaling needs are fundamentally different from HTTP-based deployments. Instead of scaling based on HTTP request load, TaskConsumers should scale based on the number of pending tasks in the message queue.

Overall Architecture (Full Feature)

  ┌─────────────────┐      ┌──────────────────┐      ┌─────────────────┐
  │  Message Queue  │◄─────│  QueueMonitor    │      │ ServeController │
  │  (Redis/RMQ)    │      │  Actor           │◄─────│ Autoscaler      │
  └─────────────────┘      └──────────────────┘      └─────────────────┘
                                   │                         │
                                   │ get_queue_length()      │
                                   └─────────────────────────┘
                                             │
                                             ▼
                                ┌───────────────────────────┐
                                │ queue_based_autoscaling   │
                                │ _policy()                 │
                                │ desired = ceil(len/target)│
                                └───────────────────────────┘

The full implementation consists of three PRs:

PR	Description	Status
PR 1 (This PR)	QueueMonitor actor for querying broker queue length	🔄 Current
PR 2	Introduce default Queue-based autoscaling policy	Upcoming
PR 3	Integration with TaskConsumer deployments	Upcoming

This PR: QueueMonitor Actor

This PR introduces the QueueMonitor Ray actor that queries message brokers to get queue length for autoscaling decisions.

Key Features

Multi-broker support: Redis and RabbitMQ
Lightweight Ray actor: Runs with num_cpus=0, and pika and redis in runtime env
Fault tolerance: Caches last known queue length on query failures
Named actor pattern: QUEUE_MONITOR::<deployment_name> for easy lookup

Queue Length Calculation

For accurate autoscaling, QueueMonitor returns total workload (pending tasks):

Broker	Pending Tasks
Redis	LLEN
RabbitMQ	messages_ready

Components

QueueMonitorConfig - Configuration dataclass with broker URL and queue name
QueueMonitor - Core class that initializes broker connections and queries queue length
QueueMonitorActor - Ray actor wrapper for remote access
Helper functions:
- create_queue_monitor_actor() - Create named actor
- get_queue_monitor_actor() - Lookup existing actor
- delete_queue_monitor_actor() - Cleanup on deployment deletion

Test Plan

Unit tests for QueueMonitorConfig (7 tests)
- Broker type detection (Redis, RabbitMQ, SQS, unknown)
- Config value storage
Unit tests for QueueMonitor (4 tests)
- Redis queue length retrieval (pending)
- RabbitMQ queue length retrieval
- Error handling with cached value fallback

gemini-code-assist

Code Review

This pull request introduces a QueueMonitor actor for monitoring queue lengths in Redis and RabbitMQ, which is a valuable addition for asynchronous task processing. The implementation is generally well-structured and includes unit tests. However, I've identified a significant performance concern with the RabbitMQ connection handling that should be addressed. Additionally, there's a minor inconsistency in broker type detection and an opportunity to improve test coverage for the new actor helper functions. My detailed feedback is in the comments below.

python/ray/serve/_private/queue_monitor.py

python/ray/serve/tests/unit/test_queue_monitor.py

python/ray/serve/_private/queue_monitor.py

python/setup.py