Skip to content

[Serve][1/N] Add autoscaler observability core API schema#55919

Merged
zcin merged 18 commits intoray-project:masterfrom
nadongjun:serve-obsv-schema
Sep 3, 2025
Merged

[Serve][1/N] Add autoscaler observability core API schema#55919
zcin merged 18 commits intoray-project:masterfrom
nadongjun:serve-obsv-schema

Conversation

@nadongjun
Copy link
Copy Markdown
Contributor

Why are these changes needed?

This PR adds the Serve Autoscaler Observability schema.
The schema defines structured models in schema.py, allowing serve status -v to return detailed observability data for both deployments and applications.

With these models, the examples in the design spec can now be expressed as structured Pydantic objects. This lays the groundwork for integrating the schema into controller logic and CLI output in follow-up PRs.

Example: Deployment snapshot

======== Serve Autoscaler status: 2025-08-19T15:05:30Z ========
Deployment status
---------------------------------------------------------------
deployment_default_policy:
    Current replicas: 3
    Target replicas: 5
    Replicas allowed: min=1, max=10
    Scaling status: scaling up
    Scaling decisions:
        2025-08-19T14:00:00Z - scaled down from 5 -> 3 (low traffic)
        2025-08-19T15:05:00Z - scaled up from 3 -> 5 (12 requests queued)
    Policy: Default (queue-length based)
    Metrics (look_back_period_s=30):
        queued_requests: 12
    Metric collection: delayed (last update 30s ago)
    Errors: (none)
Deployment spec requirement Schema / fields
Current / Target replicas / Replicas allowed DeploymentAutoscalerView.current_replicas, target_replicas, min_replicas, max_replicas
Scaling status (up/down/stable) DeploymentAutoscalerView.scaling_status (ScalingStatus)
Scaling decisions (timestamp, from→to, reason) ScalingDecision.timestamp_s, from_replicas, to_replicas, reason, source, policy, metrics
Policy ScalingDecision.policy
Metrics (lookback, queued_requests, etc.) DeploymentAutoscalerView.metrics, lookback_period_s
Metric collection state DeploymentAutoscalerView.metrics_health (MetricsHealth)
Errors DeploymentAutoscalerView.errors
Webhook history ExternalScalerView.webhook_history[] (WebhookEvent)

Example: Application snapshot

======== Serve Autoscaler status: 2025-08-20T10:00:00Z ========
Application status
---------------------------------------------------------------
application_default_policy:
    Scaling status: scaling up
    Policy: Custom (example_application_policy)
    Scaling decisions:
        2025-08-20T09:55:00Z - scaled up frontend: 2 -> 4, backend: 4 -> 6 (total_requests=200)
    Metrics (look_back_period_s=45):
        total_requests: 200
    Errors: (none)

Deployments:
    frontend:
        Current replicas: 4
        Target replicas: 4
        Replicas allowed: min=1, max=10
    backend:
        Current replicas: 6
        Target replicas: 6
        Replicas allowed: min=2, max=20
Application spec requirement Schema / fields
Application name ApplicationAutoscalerView.application
Application scaling status ApplicationAutoscalerView.scaling_status
Application policy ApplicationAutoscalerView.policy
Scaling decisions ApplicationAutoscalerView.decisions[]
Metrics / lookback ApplicationAutoscalerView.metrics, lookback_period_s
Errors ApplicationAutoscalerView.errors
Deployment summaries (current/target/min/max) ApplicationAutoscalerView.deployments ({dep: {current, target}})

Snapshot envelope

At the top-level, every snapshot is wrapped in:

Snapshot field Schema / fields
Timestamp ServeAutoscalerObservability.timestamp_s
Version ServeAutoscalerObservability.version
Deployment list ServeAutoscalerObservability.deployments[]
Application list ServeAutoscalerObservability.applications[]
External scaler list ServeAutoscalerObservability.external_scalers[]

Related issue number

#55834

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
@nadongjun nadongjun requested a review from a team as a code owner August 25, 2025 23:48
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new schema for Serve Autoscaler Observability, which is a great addition for enhancing visibility into the autoscaling process. The Pydantic models are well-defined and cover the requirements outlined in the design specification. The code is clean and follows best practices. I have one suggestion to further improve the schema's robustness and clarity for application-level observability by defining a more specific model for deployment summaries.

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
@ray-gardener ray-gardener bot added serve Ray Serve Related Issue observability Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling community-contribution Contributed by the community labels Aug 26, 2025
Copy link
Copy Markdown
Contributor

@zcin zcin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nadongjun thanks for the contribution!

btw lint is failing because the new public apis in schema.py need to be documented

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
@nadongjun nadongjun requested a review from a team as a code owner August 27, 2025 00:27
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Copy link
Copy Markdown
Contributor

@abrarsheikh abrarsheikh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i suggest keeping application scaler and external scaler out for now since those features are not yet introduced in the system. Let's start with what current exists and then incrementally add things

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
@nadongjun
Copy link
Copy Markdown
Contributor Author

i suggest keeping application scaler and external scaler out for now since those features are not yet introduced in the system. Let's start with what current exists and then incrementally add things

Yes, it’ll keep things less complex. I’ve updated to only keep the deployment-level autoscaler for now.



@PublicAPI(stability="alpha")
class DeploymentAutoscalerView(BaseModel):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DeploymentAutoscalerView -> DeploymentAutoscalingDetail

@zcin do you have better suggestion here?

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Copy link
Copy Markdown
Contributor

@abrarsheikh abrarsheikh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nit: lg2m

@abrarsheikh abrarsheikh added the go add ONLY when ready to merge, run all tests label Aug 30, 2025
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
@nadongjun
Copy link
Copy Markdown
Contributor Author

Can anyone take a look and help merge?

Copy link
Copy Markdown
Contributor

@zcin zcin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits, mostly lgtm

metrics: Optional[Dict[str, Any]] = Field(
None, description="Aggregated metrics for this deployment."
)
metrics_health: AutoscalingMetricsHealth = Field(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how will we decide metrics health?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, metrics are collected internally, so metrics_health just defaults to HEALTHY.

Once external scalers or custom metric sources are introduced, their status will be reflected in metrics_health.

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
@zcin zcin merged commit 9ab68b0 into ray-project:master Sep 3, 2025
5 checks passed
sampan-s-nayak pushed a commit to sampan-s-nayak/ray that referenced this pull request Sep 8, 2025
…t#55919)

## Why are these changes needed?

This PR adds the Serve Autoscaler Observability schema.
The schema defines structured models in `schema.py`, allowing `serve
status -v` to return detailed observability data for both deployments
and applications.

With these models, the examples in the design spec can now be expressed
as structured Pydantic objects. This lays the groundwork for integrating
the schema into controller logic and CLI output in follow-up PRs.

### Example: Deployment snapshot
```sh
======== Serve Autoscaler status: 2025-08-19T15:05:30Z ========
Deployment status
---------------------------------------------------------------
deployment_default_policy:
    Current replicas: 3
    Target replicas: 5
    Replicas allowed: min=1, max=10
    Scaling status: scaling up
    Scaling decisions:
        2025-08-19T14:00:00Z - scaled down from 5 -> 3 (low traffic)
        2025-08-19T15:05:00Z - scaled up from 3 -> 5 (12 requests queued)
    Policy: Default (queue-length based)
    Metrics (look_back_period_s=30):
        queued_requests: 12
    Metric collection: delayed (last update 30s ago)
    Errors: (none)
```
| Deployment spec requirement | Schema / fields |

|-----------------------------------------------|---------------------------------------------------------------------------------------------------|
| Current / Target replicas / Replicas allowed |
`DeploymentAutoscalerView.current_replicas`, `target_replicas`,
`min_replicas`, `max_replicas` |
| Scaling status (up/down/stable) |
`DeploymentAutoscalerView.scaling_status` (`ScalingStatus`) |
| Scaling decisions (timestamp, from→to, reason)|
`ScalingDecision.timestamp_s`, `from_replicas`, `to_replicas`, `reason`,
`source`, `policy`, `metrics` |
| Policy | `ScalingDecision.policy` |
| Metrics (lookback, queued_requests, etc.) |
`DeploymentAutoscalerView.metrics`, `lookback_period_s` |
| Metric collection state | `DeploymentAutoscalerView.metrics_health`
(`MetricsHealth`) |
| Errors | `DeploymentAutoscalerView.errors` |
| Webhook history | `ExternalScalerView.webhook_history[]`
(`WebhookEvent`) |

Signed-off-by: sampan <sampan@anyscale.com>

---

### Example: Application snapshot

```sh
======== Serve Autoscaler status: 2025-08-20T10:00:00Z ========
Application status
---------------------------------------------------------------
application_default_policy:
    Scaling status: scaling up
    Policy: Custom (example_application_policy)
    Scaling decisions:
        2025-08-20T09:55:00Z - scaled up frontend: 2 -> 4, backend: 4 -> 6 (total_requests=200)
    Metrics (look_back_period_s=45):
        total_requests: 200
    Errors: (none)

Deployments:
    frontend:
        Current replicas: 4
        Target replicas: 4
        Replicas allowed: min=1, max=10
    backend:
        Current replicas: 6
        Target replicas: 6
        Replicas allowed: min=2, max=20
```

| Application spec requirement | Schema / fields |

|-----------------------------------------------|-------------------------------------------------------------|
| Application name | `ApplicationAutoscalerView.application` |
| Application scaling status |
`ApplicationAutoscalerView.scaling_status` |
| Application policy | `ApplicationAutoscalerView.policy` |
| Scaling decisions | `ApplicationAutoscalerView.decisions[]` |
| Metrics / lookback | `ApplicationAutoscalerView.metrics`,
`lookback_period_s` |
| Errors | `ApplicationAutoscalerView.errors` |
| Deployment summaries (current/target/min/max) |
`ApplicationAutoscalerView.deployments` (`{dep: {current, target}}`) |

---

### Snapshot envelope

At the top-level, every snapshot is wrapped in:

| Snapshot field | Schema / fields |

|----------------------|--------------------------------------------------|
| Timestamp | `ServeAutoscalerObservability.timestamp_s` |
| Version | `ServeAutoscalerObservability.version` |
| Deployment list | `ServeAutoscalerObservability.deployments[]` |
| Application list | `ServeAutoscalerObservability.applications[]` |
| External scaler list |
`ServeAutoscalerObservability.external_scalers[]`|

<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number
ray-project#55834

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Signed-off-by: sampan <sampan@anyscale.com>
jugalshah291 pushed a commit to jugalshah291/ray_fork that referenced this pull request Sep 11, 2025
…t#55919)

## Why are these changes needed?

This PR adds the Serve Autoscaler Observability schema.
The schema defines structured models in `schema.py`, allowing `serve
status -v` to return detailed observability data for both deployments
and applications.

With these models, the examples in the design spec can now be expressed
as structured Pydantic objects. This lays the groundwork for integrating
the schema into controller logic and CLI output in follow-up PRs.

### Example: Deployment snapshot
```sh
======== Serve Autoscaler status: 2025-08-19T15:05:30Z ========
Deployment status
---------------------------------------------------------------
deployment_default_policy:
    Current replicas: 3
    Target replicas: 5
    Replicas allowed: min=1, max=10
    Scaling status: scaling up
    Scaling decisions:
        2025-08-19T14:00:00Z - scaled down from 5 -> 3 (low traffic)
        2025-08-19T15:05:00Z - scaled up from 3 -> 5 (12 requests queued)
    Policy: Default (queue-length based)
    Metrics (look_back_period_s=30):
        queued_requests: 12
    Metric collection: delayed (last update 30s ago)
    Errors: (none)
```
| Deployment spec requirement | Schema / fields |

|-----------------------------------------------|---------------------------------------------------------------------------------------------------|
| Current / Target replicas / Replicas allowed |
`DeploymentAutoscalerView.current_replicas`, `target_replicas`,
`min_replicas`, `max_replicas` |
| Scaling status (up/down/stable) |
`DeploymentAutoscalerView.scaling_status` (`ScalingStatus`) |
| Scaling decisions (timestamp, from→to, reason)|
`ScalingDecision.timestamp_s`, `from_replicas`, `to_replicas`, `reason`,
`source`, `policy`, `metrics` |
| Policy | `ScalingDecision.policy` |
| Metrics (lookback, queued_requests, etc.) |
`DeploymentAutoscalerView.metrics`, `lookback_period_s` |
| Metric collection state | `DeploymentAutoscalerView.metrics_health`
(`MetricsHealth`) |
| Errors | `DeploymentAutoscalerView.errors` |
| Webhook history | `ExternalScalerView.webhook_history[]`
(`WebhookEvent`) |

Signed-off-by: jugalshah291 <shah.jugal291@gmail.com>

---

### Example: Application snapshot

```sh
======== Serve Autoscaler status: 2025-08-20T10:00:00Z ========
Application status
---------------------------------------------------------------
application_default_policy:
    Scaling status: scaling up
    Policy: Custom (example_application_policy)
    Scaling decisions:
        2025-08-20T09:55:00Z - scaled up frontend: 2 -> 4, backend: 4 -> 6 (total_requests=200)
    Metrics (look_back_period_s=45):
        total_requests: 200
    Errors: (none)

Deployments:
    frontend:
        Current replicas: 4
        Target replicas: 4
        Replicas allowed: min=1, max=10
    backend:
        Current replicas: 6
        Target replicas: 6
        Replicas allowed: min=2, max=20
```

| Application spec requirement | Schema / fields |

|-----------------------------------------------|-------------------------------------------------------------|
| Application name | `ApplicationAutoscalerView.application` |
| Application scaling status |
`ApplicationAutoscalerView.scaling_status` |
| Application policy | `ApplicationAutoscalerView.policy` |
| Scaling decisions | `ApplicationAutoscalerView.decisions[]` |
| Metrics / lookback | `ApplicationAutoscalerView.metrics`,
`lookback_period_s` |
| Errors | `ApplicationAutoscalerView.errors` |
| Deployment summaries (current/target/min/max) |
`ApplicationAutoscalerView.deployments` (`{dep: {current, target}}`) |

---

### Snapshot envelope

At the top-level, every snapshot is wrapped in:

| Snapshot field | Schema / fields |

|----------------------|--------------------------------------------------|
| Timestamp | `ServeAutoscalerObservability.timestamp_s` |
| Version | `ServeAutoscalerObservability.version` |
| Deployment list | `ServeAutoscalerObservability.deployments[]` |
| Application list | `ServeAutoscalerObservability.applications[]` |
| External scaler list |
`ServeAutoscalerObservability.external_scalers[]`|

<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number
ray-project#55834

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Signed-off-by: jugalshah291 <shah.jugal291@gmail.com>
wyhong3103 pushed a commit to wyhong3103/ray that referenced this pull request Sep 12, 2025
…t#55919)

## Why are these changes needed?

This PR adds the Serve Autoscaler Observability schema.
The schema defines structured models in `schema.py`, allowing `serve
status -v` to return detailed observability data for both deployments
and applications.

With these models, the examples in the design spec can now be expressed
as structured Pydantic objects. This lays the groundwork for integrating
the schema into controller logic and CLI output in follow-up PRs.

### Example: Deployment snapshot
```sh
======== Serve Autoscaler status: 2025-08-19T15:05:30Z ========
Deployment status
---------------------------------------------------------------
deployment_default_policy:
    Current replicas: 3
    Target replicas: 5
    Replicas allowed: min=1, max=10
    Scaling status: scaling up
    Scaling decisions:
        2025-08-19T14:00:00Z - scaled down from 5 -> 3 (low traffic)
        2025-08-19T15:05:00Z - scaled up from 3 -> 5 (12 requests queued)
    Policy: Default (queue-length based)
    Metrics (look_back_period_s=30):
        queued_requests: 12
    Metric collection: delayed (last update 30s ago)
    Errors: (none)
```
| Deployment spec requirement | Schema / fields |

|-----------------------------------------------|---------------------------------------------------------------------------------------------------|
| Current / Target replicas / Replicas allowed |
`DeploymentAutoscalerView.current_replicas`, `target_replicas`,
`min_replicas`, `max_replicas` |
| Scaling status (up/down/stable) |
`DeploymentAutoscalerView.scaling_status` (`ScalingStatus`) |
| Scaling decisions (timestamp, from→to, reason)|
`ScalingDecision.timestamp_s`, `from_replicas`, `to_replicas`, `reason`,
`source`, `policy`, `metrics` |
| Policy | `ScalingDecision.policy` |
| Metrics (lookback, queued_requests, etc.) |
`DeploymentAutoscalerView.metrics`, `lookback_period_s` |
| Metric collection state | `DeploymentAutoscalerView.metrics_health`
(`MetricsHealth`) |
| Errors | `DeploymentAutoscalerView.errors` |
| Webhook history | `ExternalScalerView.webhook_history[]`
(`WebhookEvent`) |

---

### Example: Application snapshot

```sh
======== Serve Autoscaler status: 2025-08-20T10:00:00Z ========
Application status
---------------------------------------------------------------
application_default_policy:
    Scaling status: scaling up
    Policy: Custom (example_application_policy)
    Scaling decisions:
        2025-08-20T09:55:00Z - scaled up frontend: 2 -> 4, backend: 4 -> 6 (total_requests=200)
    Metrics (look_back_period_s=45):
        total_requests: 200
    Errors: (none)

Deployments:
    frontend:
        Current replicas: 4
        Target replicas: 4
        Replicas allowed: min=1, max=10
    backend:
        Current replicas: 6
        Target replicas: 6
        Replicas allowed: min=2, max=20
```

| Application spec requirement | Schema / fields |

|-----------------------------------------------|-------------------------------------------------------------|
| Application name | `ApplicationAutoscalerView.application` |
| Application scaling status |
`ApplicationAutoscalerView.scaling_status` |
| Application policy | `ApplicationAutoscalerView.policy` |
| Scaling decisions | `ApplicationAutoscalerView.decisions[]` |
| Metrics / lookback | `ApplicationAutoscalerView.metrics`,
`lookback_period_s` |
| Errors | `ApplicationAutoscalerView.errors` |
| Deployment summaries (current/target/min/max) |
`ApplicationAutoscalerView.deployments` (`{dep: {current, target}}`) |

---

### Snapshot envelope

At the top-level, every snapshot is wrapped in:

| Snapshot field | Schema / fields |

|----------------------|--------------------------------------------------|
| Timestamp | `ServeAutoscalerObservability.timestamp_s` |
| Version | `ServeAutoscalerObservability.version` |
| Deployment list | `ServeAutoscalerObservability.deployments[]` |
| Application list | `ServeAutoscalerObservability.applications[]` |
| External scaler list |
`ServeAutoscalerObservability.external_scalers[]`|

<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number
ray-project#55834

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Signed-off-by: yenhong.wong <yenhong.wong@grabtaxi.com>
dstrodtman pushed a commit that referenced this pull request Oct 6, 2025
## Why are these changes needed?

This PR adds the Serve Autoscaler Observability schema.
The schema defines structured models in `schema.py`, allowing `serve
status -v` to return detailed observability data for both deployments
and applications.

With these models, the examples in the design spec can now be expressed
as structured Pydantic objects. This lays the groundwork for integrating
the schema into controller logic and CLI output in follow-up PRs.

### Example: Deployment snapshot
```sh
======== Serve Autoscaler status: 2025-08-19T15:05:30Z ========
Deployment status
---------------------------------------------------------------
deployment_default_policy:
    Current replicas: 3
    Target replicas: 5
    Replicas allowed: min=1, max=10
    Scaling status: scaling up
    Scaling decisions:
        2025-08-19T14:00:00Z - scaled down from 5 -> 3 (low traffic)
        2025-08-19T15:05:00Z - scaled up from 3 -> 5 (12 requests queued)
    Policy: Default (queue-length based)
    Metrics (look_back_period_s=30):
        queued_requests: 12
    Metric collection: delayed (last update 30s ago)
    Errors: (none)
```
| Deployment spec requirement | Schema / fields |

|-----------------------------------------------|---------------------------------------------------------------------------------------------------|
| Current / Target replicas / Replicas allowed |
`DeploymentAutoscalerView.current_replicas`, `target_replicas`,
`min_replicas`, `max_replicas` |
| Scaling status (up/down/stable) |
`DeploymentAutoscalerView.scaling_status` (`ScalingStatus`) |
| Scaling decisions (timestamp, from→to, reason)|
`ScalingDecision.timestamp_s`, `from_replicas`, `to_replicas`, `reason`,
`source`, `policy`, `metrics` |
| Policy | `ScalingDecision.policy` |
| Metrics (lookback, queued_requests, etc.) |
`DeploymentAutoscalerView.metrics`, `lookback_period_s` |
| Metric collection state | `DeploymentAutoscalerView.metrics_health`
(`MetricsHealth`) |
| Errors | `DeploymentAutoscalerView.errors` |
| Webhook history | `ExternalScalerView.webhook_history[]`
(`WebhookEvent`) |

---

### Example: Application snapshot

```sh
======== Serve Autoscaler status: 2025-08-20T10:00:00Z ========
Application status
---------------------------------------------------------------
application_default_policy:
    Scaling status: scaling up
    Policy: Custom (example_application_policy)
    Scaling decisions:
        2025-08-20T09:55:00Z - scaled up frontend: 2 -> 4, backend: 4 -> 6 (total_requests=200)
    Metrics (look_back_period_s=45):
        total_requests: 200
    Errors: (none)

Deployments:
    frontend:
        Current replicas: 4
        Target replicas: 4
        Replicas allowed: min=1, max=10
    backend:
        Current replicas: 6
        Target replicas: 6
        Replicas allowed: min=2, max=20
```

| Application spec requirement | Schema / fields |

|-----------------------------------------------|-------------------------------------------------------------|
| Application name | `ApplicationAutoscalerView.application` |
| Application scaling status |
`ApplicationAutoscalerView.scaling_status` |
| Application policy | `ApplicationAutoscalerView.policy` |
| Scaling decisions | `ApplicationAutoscalerView.decisions[]` |
| Metrics / lookback | `ApplicationAutoscalerView.metrics`,
`lookback_period_s` |
| Errors | `ApplicationAutoscalerView.errors` |
| Deployment summaries (current/target/min/max) |
`ApplicationAutoscalerView.deployments` (`{dep: {current, target}}`) |

---

### Snapshot envelope

At the top-level, every snapshot is wrapped in:

| Snapshot field | Schema / fields |

|----------------------|--------------------------------------------------|
| Timestamp | `ServeAutoscalerObservability.timestamp_s` |
| Version | `ServeAutoscalerObservability.version` |
| Deployment list | `ServeAutoscalerObservability.deployments[]` |
| Application list | `ServeAutoscalerObservability.applications[]` |
| External scaler list |
`ServeAutoscalerObservability.external_scalers[]`|

<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number
#55834

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Signed-off-by: Douglas Strodtman <douglas@anyscale.com>
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
…t#55919)

## Why are these changes needed?

This PR adds the Serve Autoscaler Observability schema.  
The schema defines structured models in `schema.py`, allowing `serve
status -v` to return detailed observability data for both deployments
and applications.

With these models, the examples in the design spec can now be expressed
as structured Pydantic objects. This lays the groundwork for integrating
the schema into controller logic and CLI output in follow-up PRs.

### Example: Deployment snapshot
```sh
======== Serve Autoscaler status: 2025-08-19T15:05:30Z ========
Deployment status
---------------------------------------------------------------
deployment_default_policy:
    Current replicas: 3
    Target replicas: 5
    Replicas allowed: min=1, max=10
    Scaling status: scaling up
    Scaling decisions:
        2025-08-19T14:00:00Z - scaled down from 5 -> 3 (low traffic)
        2025-08-19T15:05:00Z - scaled up from 3 -> 5 (12 requests queued)
    Policy: Default (queue-length based)
    Metrics (look_back_period_s=30):
        queued_requests: 12
    Metric collection: delayed (last update 30s ago)
    Errors: (none)
```
| Deployment spec requirement | Schema / fields |

|-----------------------------------------------|---------------------------------------------------------------------------------------------------|
| Current / Target replicas / Replicas allowed |
`DeploymentAutoscalerView.current_replicas`, `target_replicas`,
`min_replicas`, `max_replicas` |
| Scaling status (up/down/stable) |
`DeploymentAutoscalerView.scaling_status` (`ScalingStatus`) |
| Scaling decisions (timestamp, from→to, reason)|
`ScalingDecision.timestamp_s`, `from_replicas`, `to_replicas`, `reason`,
`source`, `policy`, `metrics` |
| Policy | `ScalingDecision.policy` |
| Metrics (lookback, queued_requests, etc.) |
`DeploymentAutoscalerView.metrics`, `lookback_period_s` |
| Metric collection state | `DeploymentAutoscalerView.metrics_health`
(`MetricsHealth`) |
| Errors | `DeploymentAutoscalerView.errors` |
| Webhook history | `ExternalScalerView.webhook_history[]`
(`WebhookEvent`) |

---

### Example: Application snapshot

```sh
======== Serve Autoscaler status: 2025-08-20T10:00:00Z ========
Application status
---------------------------------------------------------------
application_default_policy:
    Scaling status: scaling up
    Policy: Custom (example_application_policy)
    Scaling decisions:
        2025-08-20T09:55:00Z - scaled up frontend: 2 -> 4, backend: 4 -> 6 (total_requests=200)
    Metrics (look_back_period_s=45):
        total_requests: 200
    Errors: (none)

Deployments:
    frontend:
        Current replicas: 4
        Target replicas: 4
        Replicas allowed: min=1, max=10
    backend:
        Current replicas: 6
        Target replicas: 6
        Replicas allowed: min=2, max=20
```

| Application spec requirement | Schema / fields |

|-----------------------------------------------|-------------------------------------------------------------|
| Application name | `ApplicationAutoscalerView.application` |
| Application scaling status |
`ApplicationAutoscalerView.scaling_status` |
| Application policy | `ApplicationAutoscalerView.policy` |
| Scaling decisions | `ApplicationAutoscalerView.decisions[]` |
| Metrics / lookback | `ApplicationAutoscalerView.metrics`,
`lookback_period_s` |
| Errors | `ApplicationAutoscalerView.errors` |
| Deployment summaries (current/target/min/max) |
`ApplicationAutoscalerView.deployments` (`{dep: {current, target}}`) |

---

### Snapshot envelope

At the top-level, every snapshot is wrapped in:

| Snapshot field | Schema / fields |

|----------------------|--------------------------------------------------|
| Timestamp | `ServeAutoscalerObservability.timestamp_s` |
| Version | `ServeAutoscalerObservability.version` |
| Deployment list | `ServeAutoscalerObservability.deployments[]` |
| Application list | `ServeAutoscalerObservability.applications[]` |
| External scaler list |
`ServeAutoscalerObservability.external_scalers[]`|


<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number
ray-project#55834

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community go add ONLY when ready to merge, run all tests observability Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling serve Ray Serve Related Issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants