[serve] Added policy state persistence for application level autoscaling by vaishdho1 · Pull Request #59118 · ray-project/ray

vaishdho1 · 2025-12-02T21:20:14Z

Description

Application level autoscaling policies in Ray Serve have no mechanism to persist policy state between control-loop iterations, preventing stateful autoscaling behavior. Additionally, per-deployment internal state needed for applying standard autoscaling config parameters over custom policies cannot be maintained.(#58622)

This PR adds the following:

The user policy state should return a Dict[DeploymentID,Dict].
Each deployment's AutoscalingContext now receives the user state returned by the custom policy per deployment ID enabling policies to maintain their state across iterations. Each deployment gets its own state.

Related issues

Fixes: #59008
Related to: #58622 #58857

Additional information

The implementation modifies the ApplicationAutoscalingState.get_decision_num_replicas() method to implement:

User returned policy state validation
Returning the policy state back into each deployment

…utoscaling policy Signed-off-by: Vaishnavi Panchavati <vaishdho10@gmail.com>

gemini-code-assist

Code Review

This pull request introduces a valuable feature for stateful application-level autoscaling in Ray Serve. By adding a mechanism to persist policy state between control loop iterations, it enables more sophisticated, stateful autoscaling behaviors. The implementation thoughtfully separates Ray Serve's internal state from user-defined state using a private key, ensuring that internal logic doesn't interfere with user policies. The changes are well-documented and include a solid test case that verifies the state persistence. My review includes a few minor suggestions to improve comment clarity and simplify a piece of test code.

python/ray/serve/_private/autoscaling_state.py

python/ray/serve/tests/unit/test_application_state.py

abrarsheikh · 2025-12-08T19:08:39Z

python/ray/serve/_private/autoscaling_state.py

+            # Separate internal state from custom policy state(Internal state is used when default autoscaling is enabled over custom policies)
+            current_state = (self._policy_state or {}).copy()
+            internal_state = current_state.get(
+                SERVE_AUTOSCALING_DECISION_COUNTERS_KEY, {}


what is the purpose of SERVE_AUTOSCALING_DECISION_COUNTERS_KEY?

since _policy_state is with an AutoscalingContext. Should we enforce that a application level policy returns Dict[deploymentID, Dict] for policy state. Does that solve the problem?

That can solve this problem since we need to just store the state of each deployment back to the deployment context we can directly do this. But the custom policy should return state per deployment there cannot be any overall state persisted.
The SERVE_AUTOSCALING_DECISION_COUNTERS_KEY just an alias for decision counters.
SERVE_AUTOSCALING_DECISION_COUNTERS_KEY = "__decision_counters"
Since this will be a private key I created a name for this.

That can solve this problem since we need to just store the state of each deployment back to the deployment context we can directly do this. But the custom policy should return state per deployment there cannot be any overall state persisted.

Let's go with this, we should run a light validation on the policy state returned by the users app level policy to verify it contains valid keys.