-
Notifications
You must be signed in to change notification settings - Fork 7.4k
Description
Problem Description
Ray Serve's default autoscaling policy (replica_queue_length_autoscaling_policy) has built-in support for many AutoscalingConfig parameters. However, when users implement custom autoscaling policies (as documented in the Advanced Autoscaling guide), they don't automatically benefit from these configuration parameters and must reimplement the logic themselves.
Config parameters embedded in the default policy
The following AutoscalingConfig parameters are tightly coupled to the default policy implementation:
-
Scaling factors (
upscaling_factor,downscaling_factor): inautoscaling_policy.py- Applied to moderate upscale/downscale decisions
- Implemented in
_calculate_desired_num_replicas()
-
Replica bounds (
min_replicas,max_replicas):- Enforces min/max limits on desired replica count
- Also implemented in
_calculate_desired_num_replicas()
-
Upscale delay (
upscale_delay_s):- Prevents rapid upscaling by requiring consistent metrics over a period
-
Downscale delays (
downscale_delay_s,downscale_to_zero_delay_s):- Prevents rapid downscaling with separate handling for scale-to-zero
Impact on custom policies
When users write custom autoscaling policies, they face two problems:
- Missing functionality: Users lose these features unless they reimplement them
- Code duplication: Users must copy complex logic (delay handling, decision counters, scale-to-zero special cases) from the default policy
- Configuration confusion: Users may set these config parameters expecting them to work, but they're silently ignored by custom policies
Example: User confusion
A user might write a custom policy for time-based autoscaling:
def scheduled_batch_policy(ctx: AutoscalingContext) -> Tuple[int, Dict[str, Any]]:
# Scale up during business hours
if is_business_hours():
return 10, {}
return 1, {}And configure it with:
autoscaling_config:
upscale_delay_s: 30
downscale_delay_s: 600
upscaling_factor: 0.5The user might expect these delays and factors to apply automatically, but they don't. The custom policy bypasses all this logic.
Proposed Solution
Refactor the autoscaling framework to separate policy logic (deciding what replica count to target) from policy enforcement (applying delays, bounds, and factors). This would enable code reuse and provide a better framework for custom policies.
Option 1: Policy decorator/wrapper (Recommended)
Create a decorator that applies standard config logic to any policy function:
def apply_autoscaling_config(
policy_func: Callable[[AutoscalingContext], Tuple[int, Dict[str, Any]]]
) -> Callable[[AutoscalingContext], Tuple[int, Dict[str, Any]]]:
"""
Wraps a policy function to automatically apply:
- upscaling_factor / downscaling_factor
- min_replicas / max_replicas bounds
- upscale_delay_s / downscale_delay_s / downscale_to_zero_delay_s
"""
def wrapped_policy(ctx: AutoscalingContext) -> Tuple[int, Dict[str, Any]]:
# Get raw desired replicas from user policy
desired_num_replicas, policy_state = policy_func(ctx)
# Apply scaling factors (if configured)
if ctx.config.upscaling_factor is not None or ctx.config.downscaling_factor is not None:
desired_num_replicas = _apply_scaling_factors(
desired_num_replicas, ctx.target_num_replicas, ctx.config
)
# Apply delay logic
decision_num_replicas, updated_state = _apply_delay_logic(
desired_num_replicas, ctx, policy_state
)
# Apply bounds
final_num_replicas = _apply_bounds(
decision_num_replicas, ctx.config,
ctx.capacity_adjusted_min_replicas,
ctx.capacity_adjusted_max_replicas
)
return final_num_replicas, updated_state
return wrapped_policyUsage in custom policies:
@apply_autoscaling_config
def my_custom_policy(ctx: AutoscalingContext) -> Tuple[int, Dict[str, Any]]:
# Just decide the desired replica count
# Delays, bounds, and factors are applied automatically
return desired_replicas, {}Advantages:
- Backward compatible (existing policies still work)
- Opt-in (users can choose to use the decorator)
- Clear separation of concerns
- Minimal code changes
Related Files
python/ray/serve/autoscaling_policy.py- Contains default policy implementationpython/ray/serve/config.py- DefinesAutoscalingConfigandAutoscalingContextdoc/source/serve/advanced-guides/advanced-autoscaling.md- Custom policy documentation