[Serve][1/N] Add autoscaler observability core API schema#55919
[Serve][1/N] Add autoscaler observability core API schema#55919zcin merged 18 commits intoray-project:masterfrom
Conversation
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a new schema for Serve Autoscaler Observability, which is a great addition for enhancing visibility into the autoscaling process. The Pydantic models are well-defined and cover the requirements outlined in the design specification. The code is clean and follows best practices. I have one suggestion to further improve the schema's robustness and clarity for application-level observability by defining a more specific model for deployment summaries.
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
zcin
left a comment
There was a problem hiding this comment.
@nadongjun thanks for the contribution!
btw lint is failing because the new public apis in schema.py need to be documented
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
abrarsheikh
left a comment
There was a problem hiding this comment.
i suggest keeping application scaler and external scaler out for now since those features are not yet introduced in the system. Let's start with what current exists and then incrementally add things
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Yes, it’ll keep things less complex. I’ve updated to only keep the deployment-level autoscaler for now. |
python/ray/serve/schema.py
Outdated
|
|
||
|
|
||
| @PublicAPI(stability="alpha") | ||
| class DeploymentAutoscalerView(BaseModel): |
There was a problem hiding this comment.
DeploymentAutoscalerView -> DeploymentAutoscalingDetail
@zcin do you have better suggestion here?
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
439251f to
e73aefa
Compare
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
|
Can anyone take a look and help merge? |
| metrics: Optional[Dict[str, Any]] = Field( | ||
| None, description="Aggregated metrics for this deployment." | ||
| ) | ||
| metrics_health: AutoscalingMetricsHealth = Field( |
There was a problem hiding this comment.
how will we decide metrics health?
There was a problem hiding this comment.
For now, metrics are collected internally, so metrics_health just defaults to HEALTHY.
Once external scalers or custom metric sources are introduced, their status will be reflected in metrics_health.
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
…t#55919) ## Why are these changes needed? This PR adds the Serve Autoscaler Observability schema. The schema defines structured models in `schema.py`, allowing `serve status -v` to return detailed observability data for both deployments and applications. With these models, the examples in the design spec can now be expressed as structured Pydantic objects. This lays the groundwork for integrating the schema into controller logic and CLI output in follow-up PRs. ### Example: Deployment snapshot ```sh ======== Serve Autoscaler status: 2025-08-19T15:05:30Z ======== Deployment status --------------------------------------------------------------- deployment_default_policy: Current replicas: 3 Target replicas: 5 Replicas allowed: min=1, max=10 Scaling status: scaling up Scaling decisions: 2025-08-19T14:00:00Z - scaled down from 5 -> 3 (low traffic) 2025-08-19T15:05:00Z - scaled up from 3 -> 5 (12 requests queued) Policy: Default (queue-length based) Metrics (look_back_period_s=30): queued_requests: 12 Metric collection: delayed (last update 30s ago) Errors: (none) ``` | Deployment spec requirement | Schema / fields | |-----------------------------------------------|---------------------------------------------------------------------------------------------------| | Current / Target replicas / Replicas allowed | `DeploymentAutoscalerView.current_replicas`, `target_replicas`, `min_replicas`, `max_replicas` | | Scaling status (up/down/stable) | `DeploymentAutoscalerView.scaling_status` (`ScalingStatus`) | | Scaling decisions (timestamp, from→to, reason)| `ScalingDecision.timestamp_s`, `from_replicas`, `to_replicas`, `reason`, `source`, `policy`, `metrics` | | Policy | `ScalingDecision.policy` | | Metrics (lookback, queued_requests, etc.) | `DeploymentAutoscalerView.metrics`, `lookback_period_s` | | Metric collection state | `DeploymentAutoscalerView.metrics_health` (`MetricsHealth`) | | Errors | `DeploymentAutoscalerView.errors` | | Webhook history | `ExternalScalerView.webhook_history[]` (`WebhookEvent`) | Signed-off-by: sampan <sampan@anyscale.com> --- ### Example: Application snapshot ```sh ======== Serve Autoscaler status: 2025-08-20T10:00:00Z ======== Application status --------------------------------------------------------------- application_default_policy: Scaling status: scaling up Policy: Custom (example_application_policy) Scaling decisions: 2025-08-20T09:55:00Z - scaled up frontend: 2 -> 4, backend: 4 -> 6 (total_requests=200) Metrics (look_back_period_s=45): total_requests: 200 Errors: (none) Deployments: frontend: Current replicas: 4 Target replicas: 4 Replicas allowed: min=1, max=10 backend: Current replicas: 6 Target replicas: 6 Replicas allowed: min=2, max=20 ``` | Application spec requirement | Schema / fields | |-----------------------------------------------|-------------------------------------------------------------| | Application name | `ApplicationAutoscalerView.application` | | Application scaling status | `ApplicationAutoscalerView.scaling_status` | | Application policy | `ApplicationAutoscalerView.policy` | | Scaling decisions | `ApplicationAutoscalerView.decisions[]` | | Metrics / lookback | `ApplicationAutoscalerView.metrics`, `lookback_period_s` | | Errors | `ApplicationAutoscalerView.errors` | | Deployment summaries (current/target/min/max) | `ApplicationAutoscalerView.deployments` (`{dep: {current, target}}`) | --- ### Snapshot envelope At the top-level, every snapshot is wrapped in: | Snapshot field | Schema / fields | |----------------------|--------------------------------------------------| | Timestamp | `ServeAutoscalerObservability.timestamp_s` | | Version | `ServeAutoscalerObservability.version` | | Deployment list | `ServeAutoscalerObservability.deployments[]` | | Application list | `ServeAutoscalerObservability.applications[]` | | External scaler list | `ServeAutoscalerObservability.external_scalers[]`| <!-- Please give a short summary of the change and the problem this solves. --> ## Related issue number ray-project#55834 ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Dongjun Na <kmu5544616@gmail.com> Signed-off-by: sampan <sampan@anyscale.com>
…t#55919) ## Why are these changes needed? This PR adds the Serve Autoscaler Observability schema. The schema defines structured models in `schema.py`, allowing `serve status -v` to return detailed observability data for both deployments and applications. With these models, the examples in the design spec can now be expressed as structured Pydantic objects. This lays the groundwork for integrating the schema into controller logic and CLI output in follow-up PRs. ### Example: Deployment snapshot ```sh ======== Serve Autoscaler status: 2025-08-19T15:05:30Z ======== Deployment status --------------------------------------------------------------- deployment_default_policy: Current replicas: 3 Target replicas: 5 Replicas allowed: min=1, max=10 Scaling status: scaling up Scaling decisions: 2025-08-19T14:00:00Z - scaled down from 5 -> 3 (low traffic) 2025-08-19T15:05:00Z - scaled up from 3 -> 5 (12 requests queued) Policy: Default (queue-length based) Metrics (look_back_period_s=30): queued_requests: 12 Metric collection: delayed (last update 30s ago) Errors: (none) ``` | Deployment spec requirement | Schema / fields | |-----------------------------------------------|---------------------------------------------------------------------------------------------------| | Current / Target replicas / Replicas allowed | `DeploymentAutoscalerView.current_replicas`, `target_replicas`, `min_replicas`, `max_replicas` | | Scaling status (up/down/stable) | `DeploymentAutoscalerView.scaling_status` (`ScalingStatus`) | | Scaling decisions (timestamp, from→to, reason)| `ScalingDecision.timestamp_s`, `from_replicas`, `to_replicas`, `reason`, `source`, `policy`, `metrics` | | Policy | `ScalingDecision.policy` | | Metrics (lookback, queued_requests, etc.) | `DeploymentAutoscalerView.metrics`, `lookback_period_s` | | Metric collection state | `DeploymentAutoscalerView.metrics_health` (`MetricsHealth`) | | Errors | `DeploymentAutoscalerView.errors` | | Webhook history | `ExternalScalerView.webhook_history[]` (`WebhookEvent`) | Signed-off-by: jugalshah291 <shah.jugal291@gmail.com> --- ### Example: Application snapshot ```sh ======== Serve Autoscaler status: 2025-08-20T10:00:00Z ======== Application status --------------------------------------------------------------- application_default_policy: Scaling status: scaling up Policy: Custom (example_application_policy) Scaling decisions: 2025-08-20T09:55:00Z - scaled up frontend: 2 -> 4, backend: 4 -> 6 (total_requests=200) Metrics (look_back_period_s=45): total_requests: 200 Errors: (none) Deployments: frontend: Current replicas: 4 Target replicas: 4 Replicas allowed: min=1, max=10 backend: Current replicas: 6 Target replicas: 6 Replicas allowed: min=2, max=20 ``` | Application spec requirement | Schema / fields | |-----------------------------------------------|-------------------------------------------------------------| | Application name | `ApplicationAutoscalerView.application` | | Application scaling status | `ApplicationAutoscalerView.scaling_status` | | Application policy | `ApplicationAutoscalerView.policy` | | Scaling decisions | `ApplicationAutoscalerView.decisions[]` | | Metrics / lookback | `ApplicationAutoscalerView.metrics`, `lookback_period_s` | | Errors | `ApplicationAutoscalerView.errors` | | Deployment summaries (current/target/min/max) | `ApplicationAutoscalerView.deployments` (`{dep: {current, target}}`) | --- ### Snapshot envelope At the top-level, every snapshot is wrapped in: | Snapshot field | Schema / fields | |----------------------|--------------------------------------------------| | Timestamp | `ServeAutoscalerObservability.timestamp_s` | | Version | `ServeAutoscalerObservability.version` | | Deployment list | `ServeAutoscalerObservability.deployments[]` | | Application list | `ServeAutoscalerObservability.applications[]` | | External scaler list | `ServeAutoscalerObservability.external_scalers[]`| <!-- Please give a short summary of the change and the problem this solves. --> ## Related issue number ray-project#55834 ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Dongjun Na <kmu5544616@gmail.com> Signed-off-by: jugalshah291 <shah.jugal291@gmail.com>
…t#55919) ## Why are these changes needed? This PR adds the Serve Autoscaler Observability schema. The schema defines structured models in `schema.py`, allowing `serve status -v` to return detailed observability data for both deployments and applications. With these models, the examples in the design spec can now be expressed as structured Pydantic objects. This lays the groundwork for integrating the schema into controller logic and CLI output in follow-up PRs. ### Example: Deployment snapshot ```sh ======== Serve Autoscaler status: 2025-08-19T15:05:30Z ======== Deployment status --------------------------------------------------------------- deployment_default_policy: Current replicas: 3 Target replicas: 5 Replicas allowed: min=1, max=10 Scaling status: scaling up Scaling decisions: 2025-08-19T14:00:00Z - scaled down from 5 -> 3 (low traffic) 2025-08-19T15:05:00Z - scaled up from 3 -> 5 (12 requests queued) Policy: Default (queue-length based) Metrics (look_back_period_s=30): queued_requests: 12 Metric collection: delayed (last update 30s ago) Errors: (none) ``` | Deployment spec requirement | Schema / fields | |-----------------------------------------------|---------------------------------------------------------------------------------------------------| | Current / Target replicas / Replicas allowed | `DeploymentAutoscalerView.current_replicas`, `target_replicas`, `min_replicas`, `max_replicas` | | Scaling status (up/down/stable) | `DeploymentAutoscalerView.scaling_status` (`ScalingStatus`) | | Scaling decisions (timestamp, from→to, reason)| `ScalingDecision.timestamp_s`, `from_replicas`, `to_replicas`, `reason`, `source`, `policy`, `metrics` | | Policy | `ScalingDecision.policy` | | Metrics (lookback, queued_requests, etc.) | `DeploymentAutoscalerView.metrics`, `lookback_period_s` | | Metric collection state | `DeploymentAutoscalerView.metrics_health` (`MetricsHealth`) | | Errors | `DeploymentAutoscalerView.errors` | | Webhook history | `ExternalScalerView.webhook_history[]` (`WebhookEvent`) | --- ### Example: Application snapshot ```sh ======== Serve Autoscaler status: 2025-08-20T10:00:00Z ======== Application status --------------------------------------------------------------- application_default_policy: Scaling status: scaling up Policy: Custom (example_application_policy) Scaling decisions: 2025-08-20T09:55:00Z - scaled up frontend: 2 -> 4, backend: 4 -> 6 (total_requests=200) Metrics (look_back_period_s=45): total_requests: 200 Errors: (none) Deployments: frontend: Current replicas: 4 Target replicas: 4 Replicas allowed: min=1, max=10 backend: Current replicas: 6 Target replicas: 6 Replicas allowed: min=2, max=20 ``` | Application spec requirement | Schema / fields | |-----------------------------------------------|-------------------------------------------------------------| | Application name | `ApplicationAutoscalerView.application` | | Application scaling status | `ApplicationAutoscalerView.scaling_status` | | Application policy | `ApplicationAutoscalerView.policy` | | Scaling decisions | `ApplicationAutoscalerView.decisions[]` | | Metrics / lookback | `ApplicationAutoscalerView.metrics`, `lookback_period_s` | | Errors | `ApplicationAutoscalerView.errors` | | Deployment summaries (current/target/min/max) | `ApplicationAutoscalerView.deployments` (`{dep: {current, target}}`) | --- ### Snapshot envelope At the top-level, every snapshot is wrapped in: | Snapshot field | Schema / fields | |----------------------|--------------------------------------------------| | Timestamp | `ServeAutoscalerObservability.timestamp_s` | | Version | `ServeAutoscalerObservability.version` | | Deployment list | `ServeAutoscalerObservability.deployments[]` | | Application list | `ServeAutoscalerObservability.applications[]` | | External scaler list | `ServeAutoscalerObservability.external_scalers[]`| <!-- Please give a short summary of the change and the problem this solves. --> ## Related issue number ray-project#55834 ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Dongjun Na <kmu5544616@gmail.com> Signed-off-by: yenhong.wong <yenhong.wong@grabtaxi.com>
## Why are these changes needed?
This PR adds the Serve Autoscaler Observability schema.
The schema defines structured models in `schema.py`, allowing `serve
status -v` to return detailed observability data for both deployments
and applications.
With these models, the examples in the design spec can now be expressed
as structured Pydantic objects. This lays the groundwork for integrating
the schema into controller logic and CLI output in follow-up PRs.
### Example: Deployment snapshot
```sh
======== Serve Autoscaler status: 2025-08-19T15:05:30Z ========
Deployment status
---------------------------------------------------------------
deployment_default_policy:
Current replicas: 3
Target replicas: 5
Replicas allowed: min=1, max=10
Scaling status: scaling up
Scaling decisions:
2025-08-19T14:00:00Z - scaled down from 5 -> 3 (low traffic)
2025-08-19T15:05:00Z - scaled up from 3 -> 5 (12 requests queued)
Policy: Default (queue-length based)
Metrics (look_back_period_s=30):
queued_requests: 12
Metric collection: delayed (last update 30s ago)
Errors: (none)
```
| Deployment spec requirement | Schema / fields |
|-----------------------------------------------|---------------------------------------------------------------------------------------------------|
| Current / Target replicas / Replicas allowed |
`DeploymentAutoscalerView.current_replicas`, `target_replicas`,
`min_replicas`, `max_replicas` |
| Scaling status (up/down/stable) |
`DeploymentAutoscalerView.scaling_status` (`ScalingStatus`) |
| Scaling decisions (timestamp, from→to, reason)|
`ScalingDecision.timestamp_s`, `from_replicas`, `to_replicas`, `reason`,
`source`, `policy`, `metrics` |
| Policy | `ScalingDecision.policy` |
| Metrics (lookback, queued_requests, etc.) |
`DeploymentAutoscalerView.metrics`, `lookback_period_s` |
| Metric collection state | `DeploymentAutoscalerView.metrics_health`
(`MetricsHealth`) |
| Errors | `DeploymentAutoscalerView.errors` |
| Webhook history | `ExternalScalerView.webhook_history[]`
(`WebhookEvent`) |
---
### Example: Application snapshot
```sh
======== Serve Autoscaler status: 2025-08-20T10:00:00Z ========
Application status
---------------------------------------------------------------
application_default_policy:
Scaling status: scaling up
Policy: Custom (example_application_policy)
Scaling decisions:
2025-08-20T09:55:00Z - scaled up frontend: 2 -> 4, backend: 4 -> 6 (total_requests=200)
Metrics (look_back_period_s=45):
total_requests: 200
Errors: (none)
Deployments:
frontend:
Current replicas: 4
Target replicas: 4
Replicas allowed: min=1, max=10
backend:
Current replicas: 6
Target replicas: 6
Replicas allowed: min=2, max=20
```
| Application spec requirement | Schema / fields |
|-----------------------------------------------|-------------------------------------------------------------|
| Application name | `ApplicationAutoscalerView.application` |
| Application scaling status |
`ApplicationAutoscalerView.scaling_status` |
| Application policy | `ApplicationAutoscalerView.policy` |
| Scaling decisions | `ApplicationAutoscalerView.decisions[]` |
| Metrics / lookback | `ApplicationAutoscalerView.metrics`,
`lookback_period_s` |
| Errors | `ApplicationAutoscalerView.errors` |
| Deployment summaries (current/target/min/max) |
`ApplicationAutoscalerView.deployments` (`{dep: {current, target}}`) |
---
### Snapshot envelope
At the top-level, every snapshot is wrapped in:
| Snapshot field | Schema / fields |
|----------------------|--------------------------------------------------|
| Timestamp | `ServeAutoscalerObservability.timestamp_s` |
| Version | `ServeAutoscalerObservability.version` |
| Deployment list | `ServeAutoscalerObservability.deployments[]` |
| Application list | `ServeAutoscalerObservability.applications[]` |
| External scaler list |
`ServeAutoscalerObservability.external_scalers[]`|
<!-- Please give a short summary of the change and the problem this
solves. -->
## Related issue number
#55834
## Checks
- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
- [ ] Unit tests
- [ ] Release tests
- [ ] This PR is not tested :(
---------
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Signed-off-by: Douglas Strodtman <douglas@anyscale.com>
…t#55919) ## Why are these changes needed? This PR adds the Serve Autoscaler Observability schema. The schema defines structured models in `schema.py`, allowing `serve status -v` to return detailed observability data for both deployments and applications. With these models, the examples in the design spec can now be expressed as structured Pydantic objects. This lays the groundwork for integrating the schema into controller logic and CLI output in follow-up PRs. ### Example: Deployment snapshot ```sh ======== Serve Autoscaler status: 2025-08-19T15:05:30Z ======== Deployment status --------------------------------------------------------------- deployment_default_policy: Current replicas: 3 Target replicas: 5 Replicas allowed: min=1, max=10 Scaling status: scaling up Scaling decisions: 2025-08-19T14:00:00Z - scaled down from 5 -> 3 (low traffic) 2025-08-19T15:05:00Z - scaled up from 3 -> 5 (12 requests queued) Policy: Default (queue-length based) Metrics (look_back_period_s=30): queued_requests: 12 Metric collection: delayed (last update 30s ago) Errors: (none) ``` | Deployment spec requirement | Schema / fields | |-----------------------------------------------|---------------------------------------------------------------------------------------------------| | Current / Target replicas / Replicas allowed | `DeploymentAutoscalerView.current_replicas`, `target_replicas`, `min_replicas`, `max_replicas` | | Scaling status (up/down/stable) | `DeploymentAutoscalerView.scaling_status` (`ScalingStatus`) | | Scaling decisions (timestamp, from→to, reason)| `ScalingDecision.timestamp_s`, `from_replicas`, `to_replicas`, `reason`, `source`, `policy`, `metrics` | | Policy | `ScalingDecision.policy` | | Metrics (lookback, queued_requests, etc.) | `DeploymentAutoscalerView.metrics`, `lookback_period_s` | | Metric collection state | `DeploymentAutoscalerView.metrics_health` (`MetricsHealth`) | | Errors | `DeploymentAutoscalerView.errors` | | Webhook history | `ExternalScalerView.webhook_history[]` (`WebhookEvent`) | --- ### Example: Application snapshot ```sh ======== Serve Autoscaler status: 2025-08-20T10:00:00Z ======== Application status --------------------------------------------------------------- application_default_policy: Scaling status: scaling up Policy: Custom (example_application_policy) Scaling decisions: 2025-08-20T09:55:00Z - scaled up frontend: 2 -> 4, backend: 4 -> 6 (total_requests=200) Metrics (look_back_period_s=45): total_requests: 200 Errors: (none) Deployments: frontend: Current replicas: 4 Target replicas: 4 Replicas allowed: min=1, max=10 backend: Current replicas: 6 Target replicas: 6 Replicas allowed: min=2, max=20 ``` | Application spec requirement | Schema / fields | |-----------------------------------------------|-------------------------------------------------------------| | Application name | `ApplicationAutoscalerView.application` | | Application scaling status | `ApplicationAutoscalerView.scaling_status` | | Application policy | `ApplicationAutoscalerView.policy` | | Scaling decisions | `ApplicationAutoscalerView.decisions[]` | | Metrics / lookback | `ApplicationAutoscalerView.metrics`, `lookback_period_s` | | Errors | `ApplicationAutoscalerView.errors` | | Deployment summaries (current/target/min/max) | `ApplicationAutoscalerView.deployments` (`{dep: {current, target}}`) | --- ### Snapshot envelope At the top-level, every snapshot is wrapped in: | Snapshot field | Schema / fields | |----------------------|--------------------------------------------------| | Timestamp | `ServeAutoscalerObservability.timestamp_s` | | Version | `ServeAutoscalerObservability.version` | | Deployment list | `ServeAutoscalerObservability.deployments[]` | | Application list | `ServeAutoscalerObservability.applications[]` | | External scaler list | `ServeAutoscalerObservability.external_scalers[]`| <!-- Please give a short summary of the change and the problem this solves. --> ## Related issue number ray-project#55834 ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Why are these changes needed?
This PR adds the Serve Autoscaler Observability schema.
The schema defines structured models in
schema.py, allowingserve status -vto return detailed observability data for both deployments and applications.With these models, the examples in the design spec can now be expressed as structured Pydantic objects. This lays the groundwork for integrating the schema into controller logic and CLI output in follow-up PRs.
Example: Deployment snapshot
======== Serve Autoscaler status: 2025-08-19T15:05:30Z ======== Deployment status --------------------------------------------------------------- deployment_default_policy: Current replicas: 3 Target replicas: 5 Replicas allowed: min=1, max=10 Scaling status: scaling up Scaling decisions: 2025-08-19T14:00:00Z - scaled down from 5 -> 3 (low traffic) 2025-08-19T15:05:00Z - scaled up from 3 -> 5 (12 requests queued) Policy: Default (queue-length based) Metrics (look_back_period_s=30): queued_requests: 12 Metric collection: delayed (last update 30s ago) Errors: (none)DeploymentAutoscalerView.current_replicas,target_replicas,min_replicas,max_replicasDeploymentAutoscalerView.scaling_status(ScalingStatus)ScalingDecision.timestamp_s,from_replicas,to_replicas,reason,source,policy,metricsScalingDecision.policyDeploymentAutoscalerView.metrics,lookback_period_sDeploymentAutoscalerView.metrics_health(MetricsHealth)DeploymentAutoscalerView.errorsExternalScalerView.webhook_history[](WebhookEvent)Example: Application snapshot
======== Serve Autoscaler status: 2025-08-20T10:00:00Z ======== Application status --------------------------------------------------------------- application_default_policy: Scaling status: scaling up Policy: Custom (example_application_policy) Scaling decisions: 2025-08-20T09:55:00Z - scaled up frontend: 2 -> 4, backend: 4 -> 6 (total_requests=200) Metrics (look_back_period_s=45): total_requests: 200 Errors: (none) Deployments: frontend: Current replicas: 4 Target replicas: 4 Replicas allowed: min=1, max=10 backend: Current replicas: 6 Target replicas: 6 Replicas allowed: min=2, max=20ApplicationAutoscalerView.applicationApplicationAutoscalerView.scaling_statusApplicationAutoscalerView.policyApplicationAutoscalerView.decisions[]ApplicationAutoscalerView.metrics,lookback_period_sApplicationAutoscalerView.errorsApplicationAutoscalerView.deployments({dep: {current, target}})Snapshot envelope
At the top-level, every snapshot is wrapped in:
ServeAutoscalerObservability.timestamp_sServeAutoscalerObservability.versionServeAutoscalerObservability.deployments[]ServeAutoscalerObservability.applications[]ServeAutoscalerObservability.external_scalers[]Related issue number
#55834
Checks
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.