[Serve] Custom autoscaling policies don't benefit from standard autoscaling config parameters


## Problem Description

Ray Serve's default autoscaling policy (`replica_queue_length_autoscaling_policy`) has built-in support for many `AutoscalingConfig` parameters. However, when users implement custom autoscaling policies (as documented in the [Advanced Autoscaling guide](https://docs.ray.io/en/latest/serve/advanced-guides/advanced-autoscaling.html#custom-autoscaling-policies)), they don't automatically benefit from these configuration parameters and must reimplement the logic themselves.

### Config parameters embedded in the default policy

The following `AutoscalingConfig` parameters are tightly coupled to the default policy implementation:

1. **Scaling factors** (`upscaling_factor`, `downscaling_factor`):  in `autoscaling_policy.py`
   - Applied to moderate upscale/downscale decisions
   - Implemented in `_calculate_desired_num_replicas()`

2. **Replica bounds** (`min_replicas`, `max_replicas`): 
   - Enforces min/max limits on desired replica count
   - Also implemented in `_calculate_desired_num_replicas()`

3. **Upscale delay** (`upscale_delay_s`): 
   - Prevents rapid upscaling by requiring consistent metrics over a period

4. **Downscale delays** (`downscale_delay_s`, `downscale_to_zero_delay_s`): 
   - Prevents rapid downscaling with separate handling for scale-to-zero

### Impact on custom policies

When users write custom autoscaling policies, they face two problems:

1. **Missing functionality**: Users lose these features unless they reimplement them
2. **Code duplication**: Users must copy complex logic (delay handling, decision counters, scale-to-zero special cases) from the default policy
3. **Configuration confusion**: Users may set these config parameters expecting them to work, but they're silently ignored by custom policies

### Example: User confusion

A user might write a custom policy for time-based autoscaling:

```python
def scheduled_batch_policy(ctx: AutoscalingContext) -> Tuple[int, Dict[str, Any]]:
    # Scale up during business hours
    if is_business_hours():
        return 10, {}
    return 1, {}
```

And configure it with:
```yaml
autoscaling_config:
  upscale_delay_s: 30
  downscale_delay_s: 600
  upscaling_factor: 0.5
```

**The user might expect these delays and factors to apply automatically, but they don't.** The custom policy bypasses all this logic.

## Proposed Solution

Refactor the autoscaling framework to separate **policy logic** (deciding *what* replica count to target) from **policy enforcement** (applying delays, bounds, and factors). This would enable code reuse and provide a better framework for custom policies.

### Option 1: Policy decorator/wrapper (Recommended)

Create a decorator that applies standard config logic to any policy function:

```python
def apply_autoscaling_config(
    policy_func: Callable[[AutoscalingContext], Tuple[int, Dict[str, Any]]]
) -> Callable[[AutoscalingContext], Tuple[int, Dict[str, Any]]]:
    """
    Wraps a policy function to automatically apply:
    - upscaling_factor / downscaling_factor
    - min_replicas / max_replicas bounds
    - upscale_delay_s / downscale_delay_s / downscale_to_zero_delay_s
    """
    def wrapped_policy(ctx: AutoscalingContext) -> Tuple[int, Dict[str, Any]]:
        # Get raw desired replicas from user policy
        desired_num_replicas, policy_state = policy_func(ctx)
        
        # Apply scaling factors (if configured)
        if ctx.config.upscaling_factor is not None or ctx.config.downscaling_factor is not None:
            desired_num_replicas = _apply_scaling_factors(
                desired_num_replicas, ctx.target_num_replicas, ctx.config
            )
        
        # Apply delay logic
        decision_num_replicas, updated_state = _apply_delay_logic(
            desired_num_replicas, ctx, policy_state
        )
        
        # Apply bounds
        final_num_replicas = _apply_bounds(
            decision_num_replicas, ctx.config, 
            ctx.capacity_adjusted_min_replicas,
            ctx.capacity_adjusted_max_replicas
        )
        
        return final_num_replicas, updated_state
    
    return wrapped_policy
```

**Usage in custom policies:**
```python
@apply_autoscaling_config
def my_custom_policy(ctx: AutoscalingContext) -> Tuple[int, Dict[str, Any]]:
    # Just decide the desired replica count
    # Delays, bounds, and factors are applied automatically
    return desired_replicas, {}
```

**Advantages:**
- Backward compatible (existing policies still work)
- Opt-in (users can choose to use the decorator)
- Clear separation of concerns
- Minimal code changes

## Related Files

- `python/ray/serve/autoscaling_policy.py` - Contains default policy implementation
- `python/ray/serve/config.py` - Defines `AutoscalingConfig` and `AutoscalingContext`
- `doc/source/serve/advanced-guides/advanced-autoscaling.md` - Custom policy documentation


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Serve] Custom autoscaling policies don't benefit from standard autoscaling config parameters #58622

Problem Description

Config parameters embedded in the default policy

Impact on custom policies

Example: User confusion

Proposed Solution

Option 1: Policy decorator/wrapper (Recommended)

Related Files

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Serve] Custom autoscaling policies don't benefit from standard autoscaling config parameters #58622

Description

Problem Description

Config parameters embedded in the default policy

Impact on custom policies

Example: User confusion

Proposed Solution

Option 1: Policy decorator/wrapper (Recommended)

Related Files

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions