Skip to content

[Serve] Custom autoscaling policies don't benefit from standard autoscaling config parameters #58622

@abrarsheikh

Description

@abrarsheikh

Problem Description

Ray Serve's default autoscaling policy (replica_queue_length_autoscaling_policy) has built-in support for many AutoscalingConfig parameters. However, when users implement custom autoscaling policies (as documented in the Advanced Autoscaling guide), they don't automatically benefit from these configuration parameters and must reimplement the logic themselves.

Config parameters embedded in the default policy

The following AutoscalingConfig parameters are tightly coupled to the default policy implementation:

  1. Scaling factors (upscaling_factor, downscaling_factor): in autoscaling_policy.py

    • Applied to moderate upscale/downscale decisions
    • Implemented in _calculate_desired_num_replicas()
  2. Replica bounds (min_replicas, max_replicas):

    • Enforces min/max limits on desired replica count
    • Also implemented in _calculate_desired_num_replicas()
  3. Upscale delay (upscale_delay_s):

    • Prevents rapid upscaling by requiring consistent metrics over a period
  4. Downscale delays (downscale_delay_s, downscale_to_zero_delay_s):

    • Prevents rapid downscaling with separate handling for scale-to-zero

Impact on custom policies

When users write custom autoscaling policies, they face two problems:

  1. Missing functionality: Users lose these features unless they reimplement them
  2. Code duplication: Users must copy complex logic (delay handling, decision counters, scale-to-zero special cases) from the default policy
  3. Configuration confusion: Users may set these config parameters expecting them to work, but they're silently ignored by custom policies

Example: User confusion

A user might write a custom policy for time-based autoscaling:

def scheduled_batch_policy(ctx: AutoscalingContext) -> Tuple[int, Dict[str, Any]]:
    # Scale up during business hours
    if is_business_hours():
        return 10, {}
    return 1, {}

And configure it with:

autoscaling_config:
  upscale_delay_s: 30
  downscale_delay_s: 600
  upscaling_factor: 0.5

The user might expect these delays and factors to apply automatically, but they don't. The custom policy bypasses all this logic.

Proposed Solution

Refactor the autoscaling framework to separate policy logic (deciding what replica count to target) from policy enforcement (applying delays, bounds, and factors). This would enable code reuse and provide a better framework for custom policies.

Option 1: Policy decorator/wrapper (Recommended)

Create a decorator that applies standard config logic to any policy function:

def apply_autoscaling_config(
    policy_func: Callable[[AutoscalingContext], Tuple[int, Dict[str, Any]]]
) -> Callable[[AutoscalingContext], Tuple[int, Dict[str, Any]]]:
    """
    Wraps a policy function to automatically apply:
    - upscaling_factor / downscaling_factor
    - min_replicas / max_replicas bounds
    - upscale_delay_s / downscale_delay_s / downscale_to_zero_delay_s
    """
    def wrapped_policy(ctx: AutoscalingContext) -> Tuple[int, Dict[str, Any]]:
        # Get raw desired replicas from user policy
        desired_num_replicas, policy_state = policy_func(ctx)
        
        # Apply scaling factors (if configured)
        if ctx.config.upscaling_factor is not None or ctx.config.downscaling_factor is not None:
            desired_num_replicas = _apply_scaling_factors(
                desired_num_replicas, ctx.target_num_replicas, ctx.config
            )
        
        # Apply delay logic
        decision_num_replicas, updated_state = _apply_delay_logic(
            desired_num_replicas, ctx, policy_state
        )
        
        # Apply bounds
        final_num_replicas = _apply_bounds(
            decision_num_replicas, ctx.config, 
            ctx.capacity_adjusted_min_replicas,
            ctx.capacity_adjusted_max_replicas
        )
        
        return final_num_replicas, updated_state
    
    return wrapped_policy

Usage in custom policies:

@apply_autoscaling_config
def my_custom_policy(ctx: AutoscalingContext) -> Tuple[int, Dict[str, Any]]:
    # Just decide the desired replica count
    # Delays, bounds, and factors are applied automatically
    return desired_replicas, {}

Advantages:

  • Backward compatible (existing policies still work)
  • Opt-in (users can choose to use the decorator)
  • Clear separation of concerns
  • Minimal code changes

Related Files

  • python/ray/serve/autoscaling_policy.py - Contains default policy implementation
  • python/ray/serve/config.py - Defines AutoscalingConfig and AutoscalingContext
  • doc/source/serve/advanced-guides/advanced-autoscaling.md - Custom policy documentation

Metadata

Metadata

Assignees

Labels

docsAn issue or change related to documentationserveRay Serve Related Issuetech-debtThe issue that's due to tech debtusability

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions