[Serve][1/N] Add autoscaler observability core API schema by nadongjun · Pull Request #55919 · ray-project/ray

nadongjun · 2025-08-25T23:48:43Z

Why are these changes needed?

This PR adds the Serve Autoscaler Observability schema.
The schema defines structured models in schema.py, allowing serve status -v to return detailed observability data for both deployments and applications.

With these models, the examples in the design spec can now be expressed as structured Pydantic objects. This lays the groundwork for integrating the schema into controller logic and CLI output in follow-up PRs.

Example: Deployment snapshot

======== Serve Autoscaler status: 2025-08-19T15:05:30Z ========
Deployment status
---------------------------------------------------------------
deployment_default_policy:
    Current replicas: 3
    Target replicas: 5
    Replicas allowed: min=1, max=10
    Scaling status: scaling up
    Scaling decisions:
        2025-08-19T14:00:00Z - scaled down from 5 -> 3 (low traffic)
        2025-08-19T15:05:00Z - scaled up from 3 -> 5 (12 requests queued)
    Policy: Default (queue-length based)
    Metrics (look_back_period_s=30):
        queued_requests: 12
    Metric collection: delayed (last update 30s ago)
    Errors: (none)

Deployment spec requirement	Schema / fields
Current / Target replicas / Replicas allowed	`DeploymentAutoscalerView.current_replicas`, `target_replicas`, `min_replicas`, `max_replicas`
Scaling status (up/down/stable)	`DeploymentAutoscalerView.scaling_status` (`ScalingStatus`)
Scaling decisions (timestamp, from→to, reason)	`ScalingDecision.timestamp_s`, `from_replicas`, `to_replicas`, `reason`, `source`, `policy`, `metrics`
Policy	`ScalingDecision.policy`
Metrics (lookback, queued_requests, etc.)	`DeploymentAutoscalerView.metrics`, `lookback_period_s`
Metric collection state	`DeploymentAutoscalerView.metrics_health` (`MetricsHealth`)
Errors	`DeploymentAutoscalerView.errors`
Webhook history	`ExternalScalerView.webhook_history[]` (`WebhookEvent`)

Example: Application snapshot

======== Serve Autoscaler status: 2025-08-20T10:00:00Z ========
Application status
---------------------------------------------------------------
application_default_policy:
    Scaling status: scaling up
    Policy: Custom (example_application_policy)
    Scaling decisions:
        2025-08-20T09:55:00Z - scaled up frontend: 2 -> 4, backend: 4 -> 6 (total_requests=200)
    Metrics (look_back_period_s=45):
        total_requests: 200
    Errors: (none)

Deployments:
    frontend:
        Current replicas: 4
        Target replicas: 4
        Replicas allowed: min=1, max=10
    backend:
        Current replicas: 6
        Target replicas: 6
        Replicas allowed: min=2, max=20

Application spec requirement	Schema / fields
Application name	`ApplicationAutoscalerView.application`
Application scaling status	`ApplicationAutoscalerView.scaling_status`
Application policy	`ApplicationAutoscalerView.policy`
Scaling decisions	`ApplicationAutoscalerView.decisions[]`
Metrics / lookback	`ApplicationAutoscalerView.metrics`, `lookback_period_s`
Errors	`ApplicationAutoscalerView.errors`
Deployment summaries (current/target/min/max)	`ApplicationAutoscalerView.deployments` (`{dep: {current, target}}`)

Snapshot envelope

At the top-level, every snapshot is wrapped in:

Snapshot field	Schema / fields
Timestamp	`ServeAutoscalerObservability.timestamp_s`
Version	`ServeAutoscalerObservability.version`
Deployment list	`ServeAutoscalerObservability.deployments[]`
Application list	`ServeAutoscalerObservability.applications[]`
External scaler list	`ServeAutoscalerObservability.external_scalers[]`

Related issue number

#55834

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>

gemini-code-assist

Code Review

This pull request introduces a new schema for Serve Autoscaler Observability, which is a great addition for enhancing visibility into the autoscaling process. The Pydantic models are well-defined and cover the requirements outlined in the design specification. The code is clean and follows best practices. I have one suggestion to further improve the schema's robustness and clarity for application-level observability by defining a more specific model for deployment summaries.

python/ray/serve/schema.py

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>

zcin

@nadongjun thanks for the contribution!

btw lint is failing because the new public apis in schema.py need to be documented

python/ray/serve/schema.py

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>

abrarsheikh

i suggest keeping application scaler and external scaler out for now since those features are not yet introduced in the system. Let's start with what current exists and then incrementally add things

python/ray/serve/schema.py

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>

nadongjun · 2025-08-27T23:19:49Z

i suggest keeping application scaler and external scaler out for now since those features are not yet introduced in the system. Let's start with what current exists and then incrementally add things

Yes, it’ll keep things less complex. I’ve updated to only keep the deployment-level autoscaler for now.

python/ray/serve/schema.py

abrarsheikh · 2025-08-28T01:27:40Z

python/ray/serve/schema.py

+
+
+@PublicAPI(stability="alpha")
+class DeploymentAutoscalerView(BaseModel):


DeploymentAutoscalerView -> DeploymentAutoscalingDetail

@zcin do you have better suggestion here?

python/ray/serve/schema.py

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>

python/ray/serve/schema.py

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>

python/ray/serve/schema.py

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>

abrarsheikh

Some nit: lg2m

python/ray/serve/schema.py

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>

nadongjun · 2025-09-02T00:33:33Z

Can anyone take a look and help merge?

zcin

nits, mostly lgtm

python/ray/serve/schema.py

zcin · 2025-09-02T17:58:43Z

python/ray/serve/schema.py

+    metrics: Optional[Dict[str, Any]] = Field(
+        None, description="Aggregated metrics for this deployment."
+    )
+    metrics_health: AutoscalingMetricsHealth = Field(


how will we decide metrics health?

For now, metrics are collected internally, so metrics_health just defaults to HEALTHY.

Once external scalers or custom metric sources are introduced, their status will be reflected in metrics_health.

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>

…t#55919) ## Why are these changes needed? This PR adds the Serve Autoscaler Observability schema. The schema defines structured models in `schema.py`, allowing `serve status -v` to return detailed observability data for both deployments and applications. With these models, the examples in the design spec can now be expressed as structured Pydantic objects. This lays the groundwork for integrating the schema into controller logic and CLI output in follow-up PRs. ### Example: Deployment snapshot ```sh ======== Serve Autoscaler status: 2025-08-19T15:05:30Z ======== Deployment status --------------------------------------------------------------- deployment_default_policy: Current replicas: 3 Target replicas: 5 Replicas allowed: min=1, max=10 Scaling status: scaling up Scaling decisions: 2025-08-19T14:00:00Z - scaled down from 5 -> 3 (low traffic) 2025-08-19T15:05:00Z - scaled up from 3 -> 5 (12 requests queued) Policy: Default (queue-length based) Metrics (look_back_period_s=30): queued_requests: 12 Metric collection: delayed (last update 30s ago) Errors: (none) ``` | Deployment spec requirement | Schema / fields | |-----------------------------------------------|---------------------------------------------------------------------------------------------------| | Current / Target replicas / Replicas allowed | `DeploymentAutoscalerView.current_replicas`, `target_replicas`, `min_replicas`, `max_replicas` | | Scaling status (up/down/stable) | `DeploymentAutoscalerView.scaling_status` (`ScalingStatus`) | | Scaling decisions (timestamp, from→to, reason)| `ScalingDecision.timestamp_s`, `from_replicas`, `to_replicas`, `reason`, `source`, `policy`, `metrics` | | Policy | `ScalingDecision.policy` | | Metrics (lookback, queued_requests, etc.) | `DeploymentAutoscalerView.metrics`, `lookback_period_s` | | Metric collection state | `DeploymentAutoscalerView.metrics_health` (`MetricsHealth`) | | Errors | `DeploymentAutoscalerView.errors` | | Webhook history | `ExternalScalerView.webhook_history[]` (`WebhookEvent`) | Signed-off-by: sampan <sampan@anyscale.com> --- ### Example: Application snapshot ```sh ======== Serve Autoscaler status: 2025-08-20T10:00:00Z ======== Application status --------------------------------------------------------------- application_default_policy: Scaling status: scaling up Policy: Custom (example_application_policy) Scaling decisions: 2025-08-20T09:55:00Z - scaled up frontend: 2 -> 4, backend: 4 -> 6 (total_requests=200) Metrics (look_back_period_s=45): total_requests: 200 Errors: (none) Deployments: frontend: Current replicas: 4 Target replicas: 4 Replicas allowed: min=1, max=10 backend: Current replicas: 6 Target replicas: 6 Replicas allowed: min=2, max=20 ``` | Application spec requirement | Schema / fields | |-----------------------------------------------|-------------------------------------------------------------| | Application name | `ApplicationAutoscalerView.application` | | Application scaling status | `ApplicationAutoscalerView.scaling_status` | | Application policy | `ApplicationAutoscalerView.policy` | | Scaling decisions | `ApplicationAutoscalerView.decisions[]` | | Metrics / lookback | `ApplicationAutoscalerView.metrics`, `lookback_period_s` | | Errors | `ApplicationAutoscalerView.errors` | | Deployment summaries (current/target/min/max) | `ApplicationAutoscalerView.deployments` (`{dep: {current, target}}`) | --- ### Snapshot envelope At the top-level, every snapshot is wrapped in: | Snapshot field | Schema / fields | |----------------------|--------------------------------------------------| | Timestamp | `ServeAutoscalerObservability.timestamp_s` | | Version | `ServeAutoscalerObservability.version` | | Deployment list | `ServeAutoscalerObservability.deployments[]` | | Application list | `ServeAutoscalerObservability.applications[]` | | External scaler list | `ServeAutoscalerObservability.external_scalers[]`|  ## Related issue number ray-project#55834 ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Dongjun Na <kmu5544616@gmail.com> Signed-off-by: sampan <sampan@anyscale.com>

…t#55919) ## Why are these changes needed? This PR adds the Serve Autoscaler Observability schema. The schema defines structured models in `schema.py`, allowing `serve status -v` to return detailed observability data for both deployments and applications. With these models, the examples in the design spec can now be expressed as structured Pydantic objects. This lays the groundwork for integrating the schema into controller logic and CLI output in follow-up PRs. ### Example: Deployment snapshot ```sh ======== Serve Autoscaler status: 2025-08-19T15:05:30Z ======== Deployment status --------------------------------------------------------------- deployment_default_policy: Current replicas: 3 Target replicas: 5 Replicas allowed: min=1, max=10 Scaling status: scaling up Scaling decisions: 2025-08-19T14:00:00Z - scaled down from 5 -> 3 (low traffic) 2025-08-19T15:05:00Z - scaled up from 3 -> 5 (12 requests queued) Policy: Default (queue-length based) Metrics (look_back_period_s=30): queued_requests: 12 Metric collection: delayed (last update 30s ago) Errors: (none) ``` | Deployment spec requirement | Schema / fields | |-----------------------------------------------|---------------------------------------------------------------------------------------------------| | Current / Target replicas / Replicas allowed | `DeploymentAutoscalerView.current_replicas`, `target_replicas`, `min_replicas`, `max_replicas` | | Scaling status (up/down/stable) | `DeploymentAutoscalerView.scaling_status` (`ScalingStatus`) | | Scaling decisions (timestamp, from→to, reason)| `ScalingDecision.timestamp_s`, `from_replicas`, `to_replicas`, `reason`, `source`, `policy`, `metrics` | | Policy | `ScalingDecision.policy` | | Metrics (lookback, queued_requests, etc.) | `DeploymentAutoscalerView.metrics`, `lookback_period_s` | | Metric collection state | `DeploymentAutoscalerView.metrics_health` (`MetricsHealth`) | | Errors | `DeploymentAutoscalerView.errors` | | Webhook history | `ExternalScalerView.webhook_history[]` (`WebhookEvent`) | Signed-off-by: jugalshah291 <shah.jugal291@gmail.com> --- ### Example: Application snapshot ```sh ======== Serve Autoscaler status: 2025-08-20T10:00:00Z ======== Application status --------------------------------------------------------------- application_default_policy: Scaling status: scaling up Policy: Custom (example_application_policy) Scaling decisions: 2025-08-20T09:55:00Z - scaled up frontend: 2 -> 4, backend: 4 -> 6 (total_requests=200) Metrics (look_back_period_s=45): total_requests: 200 Errors: (none) Deployments: frontend: Current replicas: 4 Target replicas: 4 Replicas allowed: min=1, max=10 backend: Current replicas: 6 Target replicas: 6 Replicas allowed: min=2, max=20 ``` | Application spec requirement | Schema / fields | |-----------------------------------------------|-------------------------------------------------------------| | Application name | `ApplicationAutoscalerView.application` | | Application scaling status | `ApplicationAutoscalerView.scaling_status` | | Application policy | `ApplicationAutoscalerView.policy` | | Scaling decisions | `ApplicationAutoscalerView.decisions[]` | | Metrics / lookback | `ApplicationAutoscalerView.metrics`, `lookback_period_s` | | Errors | `ApplicationAutoscalerView.errors` | | Deployment summaries (current/target/min/max) | `ApplicationAutoscalerView.deployments` (`{dep: {current, target}}`) | --- ### Snapshot envelope At the top-level, every snapshot is wrapped in: | Snapshot field | Schema / fields | |----------------------|--------------------------------------------------| | Timestamp | `ServeAutoscalerObservability.timestamp_s` | | Version | `ServeAutoscalerObservability.version` | | Deployment list | `ServeAutoscalerObservability.deployments[]` | | Application list | `ServeAutoscalerObservability.applications[]` | | External scaler list | `ServeAutoscalerObservability.external_scalers[]`|  ## Related issue number ray-project#55834 ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Dongjun Na <kmu5544616@gmail.com> Signed-off-by: jugalshah291 <shah.jugal291@gmail.com>

…t#55919) ## Why are these changes needed? This PR adds the Serve Autoscaler Observability schema. The schema defines structured models in `schema.py`, allowing `serve status -v` to return detailed observability data for both deployments and applications. With these models, the examples in the design spec can now be expressed as structured Pydantic objects. This lays the groundwork for integrating the schema into controller logic and CLI output in follow-up PRs. ### Example: Deployment snapshot ```sh ======== Serve Autoscaler status: 2025-08-19T15:05:30Z ======== Deployment status --------------------------------------------------------------- deployment_default_policy: Current replicas: 3 Target replicas: 5 Replicas allowed: min=1, max=10 Scaling status: scaling up Scaling decisions: 2025-08-19T14:00:00Z - scaled down from 5 -> 3 (low traffic) 2025-08-19T15:05:00Z - scaled up from 3 -> 5 (12 requests queued) Policy: Default (queue-length based) Metrics (look_back_period_s=30): queued_requests: 12 Metric collection: delayed (last update 30s ago) Errors: (none) ``` | Deployment spec requirement | Schema / fields | |-----------------------------------------------|---------------------------------------------------------------------------------------------------| | Current / Target replicas / Replicas allowed | `DeploymentAutoscalerView.current_replicas`, `target_replicas`, `min_replicas`, `max_replicas` | | Scaling status (up/down/stable) | `DeploymentAutoscalerView.scaling_status` (`ScalingStatus`) | | Scaling decisions (timestamp, from→to, reason)| `ScalingDecision.timestamp_s`, `from_replicas`, `to_replicas`, `reason`, `source`, `policy`, `metrics` | | Policy | `ScalingDecision.policy` | | Metrics (lookback, queued_requests, etc.) | `DeploymentAutoscalerView.metrics`, `lookback_period_s` | | Metric collection state | `DeploymentAutoscalerView.metrics_health` (`MetricsHealth`) | | Errors | `DeploymentAutoscalerView.errors` | | Webhook history | `ExternalScalerView.webhook_history[]` (`WebhookEvent`) | --- ### Example: Application snapshot ```sh ======== Serve Autoscaler status: 2025-08-20T10:00:00Z ======== Application status --------------------------------------------------------------- application_default_policy: Scaling status: scaling up Policy: Custom (example_application_policy) Scaling decisions: 2025-08-20T09:55:00Z - scaled up frontend: 2 -> 4, backend: 4 -> 6 (total_requests=200) Metrics (look_back_period_s=45): total_requests: 200 Errors: (none) Deployments: frontend: Current replicas: 4 Target replicas: 4 Replicas allowed: min=1, max=10 backend: Current replicas: 6 Target replicas: 6 Replicas allowed: min=2, max=20 ``` | Application spec requirement | Schema / fields | |-----------------------------------------------|-------------------------------------------------------------| | Application name | `ApplicationAutoscalerView.application` | | Application scaling status | `ApplicationAutoscalerView.scaling_status` | | Application policy | `ApplicationAutoscalerView.policy` | | Scaling decisions | `ApplicationAutoscalerView.decisions[]` | | Metrics / lookback | `ApplicationAutoscalerView.metrics`, `lookback_period_s` | | Errors | `ApplicationAutoscalerView.errors` | | Deployment summaries (current/target/min/max) | `ApplicationAutoscalerView.deployments` (`{dep: {current, target}}`) | --- ### Snapshot envelope At the top-level, every snapshot is wrapped in: | Snapshot field | Schema / fields | |----------------------|--------------------------------------------------| | Timestamp | `ServeAutoscalerObservability.timestamp_s` | | Version | `ServeAutoscalerObservability.version` | | Deployment list | `ServeAutoscalerObservability.deployments[]` | | Application list | `ServeAutoscalerObservability.applications[]` | | External scaler list | `ServeAutoscalerObservability.external_scalers[]`|  ## Related issue number ray-project#55834 ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Dongjun Na <kmu5544616@gmail.com> Signed-off-by: yenhong.wong <yenhong.wong@grabtaxi.com>

## Why are these changes needed? This PR adds the Serve Autoscaler Observability schema. The schema defines structured models in `schema.py`, allowing `serve status -v` to return detailed observability data for both deployments and applications. With these models, the examples in the design spec can now be expressed as structured Pydantic objects. This lays the groundwork for integrating the schema into controller logic and CLI output in follow-up PRs. ### Example: Deployment snapshot ```sh ======== Serve Autoscaler status: 2025-08-19T15:05:30Z ======== Deployment status --------------------------------------------------------------- deployment_default_policy: Current replicas: 3 Target replicas: 5 Replicas allowed: min=1, max=10 Scaling status: scaling up Scaling decisions: 2025-08-19T14:00:00Z - scaled down from 5 -> 3 (low traffic) 2025-08-19T15:05:00Z - scaled up from 3 -> 5 (12 requests queued) Policy: Default (queue-length based) Metrics (look_back_period_s=30): queued_requests: 12 Metric collection: delayed (last update 30s ago) Errors: (none) ``` | Deployment spec requirement | Schema / fields | |-----------------------------------------------|---------------------------------------------------------------------------------------------------| | Current / Target replicas / Replicas allowed | `DeploymentAutoscalerView.current_replicas`, `target_replicas`, `min_replicas`, `max_replicas` | | Scaling status (up/down/stable) | `DeploymentAutoscalerView.scaling_status` (`ScalingStatus`) | | Scaling decisions (timestamp, from→to, reason)| `ScalingDecision.timestamp_s`, `from_replicas`, `to_replicas`, `reason`, `source`, `policy`, `metrics` | | Policy | `ScalingDecision.policy` | | Metrics (lookback, queued_requests, etc.) | `DeploymentAutoscalerView.metrics`, `lookback_period_s` | | Metric collection state | `DeploymentAutoscalerView.metrics_health` (`MetricsHealth`) | | Errors | `DeploymentAutoscalerView.errors` | | Webhook history | `ExternalScalerView.webhook_history[]` (`WebhookEvent`) | --- ### Example: Application snapshot ```sh ======== Serve Autoscaler status: 2025-08-20T10:00:00Z ======== Application status --------------------------------------------------------------- application_default_policy: Scaling status: scaling up Policy: Custom (example_application_policy) Scaling decisions: 2025-08-20T09:55:00Z - scaled up frontend: 2 -> 4, backend: 4 -> 6 (total_requests=200) Metrics (look_back_period_s=45): total_requests: 200 Errors: (none) Deployments: frontend: Current replicas: 4 Target replicas: 4 Replicas allowed: min=1, max=10 backend: Current replicas: 6 Target replicas: 6 Replicas allowed: min=2, max=20 ``` | Application spec requirement | Schema / fields | |-----------------------------------------------|-------------------------------------------------------------| | Application name | `ApplicationAutoscalerView.application` | | Application scaling status | `ApplicationAutoscalerView.scaling_status` | | Application policy | `ApplicationAutoscalerView.policy` | | Scaling decisions | `ApplicationAutoscalerView.decisions[]` | | Metrics / lookback | `ApplicationAutoscalerView.metrics`, `lookback_period_s` | | Errors | `ApplicationAutoscalerView.errors` | | Deployment summaries (current/target/min/max) | `ApplicationAutoscalerView.deployments` (`{dep: {current, target}}`) | --- ### Snapshot envelope At the top-level, every snapshot is wrapped in: | Snapshot field | Schema / fields | |----------------------|--------------------------------------------------| | Timestamp | `ServeAutoscalerObservability.timestamp_s` | | Version | `ServeAutoscalerObservability.version` | | Deployment list | `ServeAutoscalerObservability.deployments[]` | | Application list | `ServeAutoscalerObservability.applications[]` | | External scaler list | `ServeAutoscalerObservability.external_scalers[]`|  ## Related issue number #55834 ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Dongjun Na <kmu5544616@gmail.com> Signed-off-by: Douglas Strodtman <douglas@anyscale.com>

…t#55919) ## Why are these changes needed? This PR adds the Serve Autoscaler Observability schema. The schema defines structured models in `schema.py`, allowing `serve status -v` to return detailed observability data for both deployments and applications. With these models, the examples in the design spec can now be expressed as structured Pydantic objects. This lays the groundwork for integrating the schema into controller logic and CLI output in follow-up PRs. ### Example: Deployment snapshot ```sh ======== Serve Autoscaler status: 2025-08-19T15:05:30Z ======== Deployment status --------------------------------------------------------------- deployment_default_policy: Current replicas: 3 Target replicas: 5 Replicas allowed: min=1, max=10 Scaling status: scaling up Scaling decisions: 2025-08-19T14:00:00Z - scaled down from 5 -> 3 (low traffic) 2025-08-19T15:05:00Z - scaled up from 3 -> 5 (12 requests queued) Policy: Default (queue-length based) Metrics (look_back_period_s=30): queued_requests: 12 Metric collection: delayed (last update 30s ago) Errors: (none) ``` | Deployment spec requirement | Schema / fields | |-----------------------------------------------|---------------------------------------------------------------------------------------------------| | Current / Target replicas / Replicas allowed | `DeploymentAutoscalerView.current_replicas`, `target_replicas`, `min_replicas`, `max_replicas` | | Scaling status (up/down/stable) | `DeploymentAutoscalerView.scaling_status` (`ScalingStatus`) | | Scaling decisions (timestamp, from→to, reason)| `ScalingDecision.timestamp_s`, `from_replicas`, `to_replicas`, `reason`, `source`, `policy`, `metrics` | | Policy | `ScalingDecision.policy` | | Metrics (lookback, queued_requests, etc.) | `DeploymentAutoscalerView.metrics`, `lookback_period_s` | | Metric collection state | `DeploymentAutoscalerView.metrics_health` (`MetricsHealth`) | | Errors | `DeploymentAutoscalerView.errors` | | Webhook history | `ExternalScalerView.webhook_history[]` (`WebhookEvent`) | --- ### Example: Application snapshot ```sh ======== Serve Autoscaler status: 2025-08-20T10:00:00Z ======== Application status --------------------------------------------------------------- application_default_policy: Scaling status: scaling up Policy: Custom (example_application_policy) Scaling decisions: 2025-08-20T09:55:00Z - scaled up frontend: 2 -> 4, backend: 4 -> 6 (total_requests=200) Metrics (look_back_period_s=45): total_requests: 200 Errors: (none) Deployments: frontend: Current replicas: 4 Target replicas: 4 Replicas allowed: min=1, max=10 backend: Current replicas: 6 Target replicas: 6 Replicas allowed: min=2, max=20 ``` | Application spec requirement | Schema / fields | |-----------------------------------------------|-------------------------------------------------------------| | Application name | `ApplicationAutoscalerView.application` | | Application scaling status | `ApplicationAutoscalerView.scaling_status` | | Application policy | `ApplicationAutoscalerView.policy` | | Scaling decisions | `ApplicationAutoscalerView.decisions[]` | | Metrics / lookback | `ApplicationAutoscalerView.metrics`, `lookback_period_s` | | Errors | `ApplicationAutoscalerView.errors` | | Deployment summaries (current/target/min/max) | `ApplicationAutoscalerView.deployments` (`{dep: {current, target}}`) | --- ### Snapshot envelope At the top-level, every snapshot is wrapped in: | Snapshot field | Schema / fields | |----------------------|--------------------------------------------------| | Timestamp | `ServeAutoscalerObservability.timestamp_s` | | Version | `ServeAutoscalerObservability.version` | | Deployment list | `ServeAutoscalerObservability.deployments[]` | | Application list | `ServeAutoscalerObservability.applications[]` | | External scaler list | `ServeAutoscalerObservability.external_scalers[]`|  ## Related issue number ray-project#55834 ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Dongjun Na <kmu5544616@gmail.com>

[Serve] Add autoscaler observability core API schema

29af8e5

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>

nadongjun requested a review from a team as a code owner August 25, 2025 23:48

gemini-code-assist bot reviewed Aug 25, 2025

View reviewed changes

python/ray/serve/schema.py Outdated Show resolved Hide resolved

Add ApplicationDeploymentStatus

463fb51

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>

ray-gardener bot added serve Ray Serve Related Issue observability Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling community-contribution Contributed by the community labels Aug 26, 2025

Merge branch 'master' into serve-obsv-schema

60c6ea3

zcin reviewed Aug 26, 2025

View reviewed changes

python/ray/serve/schema.py Outdated Show resolved Hide resolved

nadongjun added 2 commits August 27, 2025 09:22

merge autoscaler observability into schema details

072ec8b

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>

add docs

6cae1d8

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>

nadongjun requested a review from a team as a code owner August 27, 2025 00:27

nadongjun added 2 commits August 27, 2025 14:11

Merge branch 'master' into serve-obsv-schema

52b3196

add docs

38974e8

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>

abrarsheikh reviewed Aug 27, 2025

View reviewed changes

python/ray/serve/schema.py Outdated Show resolved Hide resolved

Edit serve schema to keep only deployment-level autoscaler

a8efa36

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>

nadongjun added 2 commits August 28, 2025 08:23

Merge branch 'master' into serve-obsv-schema

38c83ab

Merge branch 'master' into serve-obsv-schema

4a8a6b7

abrarsheikh reviewed Aug 28, 2025

View reviewed changes

Update schema

20f396d

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>

abrarsheikh reviewed Aug 28, 2025

View reviewed changes

python/ray/serve/schema.py Outdated Show resolved Hide resolved

nadongjun added 2 commits August 28, 2025 12:47

Refactor schema

c96ece1

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>

Merge branch 'master' into serve-obsv-schema

0fc23de

zcin reviewed Aug 29, 2025

View reviewed changes

python/ray/serve/schema.py Outdated Show resolved Hide resolved

python/ray/serve/schema.py Outdated Show resolved Hide resolved

python/ray/serve/schema.py Outdated Show resolved Hide resolved

nadongjun added 2 commits August 30, 2025 10:00

Refactor schema

5da9541

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>

Refactor doc

e73aefa

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>

nadongjun force-pushed the serve-obsv-schema branch from 439251f to e73aefa Compare August 30, 2025 01:01

Refactor doc

109ebdd

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>

abrarsheikh approved these changes Aug 30, 2025

View reviewed changes

python/ray/serve/schema.py Outdated Show resolved Hide resolved

python/ray/serve/schema.py Outdated Show resolved Hide resolved

python/ray/serve/schema.py Outdated Show resolved Hide resolved

abrarsheikh added the go add ONLY when ready to merge, run all tests label Aug 30, 2025

Refactor schema and doc

c6f2beb

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>

zcin approved these changes Sep 2, 2025

View reviewed changes

Refactor schema

e6aa15e

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>

zcin merged commit 9ab68b0 into ray-project:master Sep 3, 2025
5 checks passed

nadongjun mentioned this pull request Sep 19, 2025

[Umbrella][serve] Advanced Observability for Serve Autoscaler #55833

Open

7 tasks



		@PublicAPI(stability="alpha")
		class DeploymentAutoscalerView(BaseModel):

Conversation

nadongjun commented Aug 25, 2025

Why are these changes needed?

Example: Deployment snapshot

Example: Application snapshot

Snapshot envelope

Related issue number

Checks

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

zcin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

abrarsheikh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nadongjun commented Aug 27, 2025

Uh oh!

Uh oh!

Uh oh!

abrarsheikh Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

abrarsheikh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nadongjun commented Sep 2, 2025

Uh oh!

zcin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

zcin Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

nadongjun Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants