[Umbrella][serve] Advanced Observability for Serve Autoscaler

### Description

This issue tracks the implementation of advanced observability for the Serve Autoscaler,
as proposed in https://github.com/ray-project/ray/issues/41135#issuecomment-3162329187 and specified in detail in [this design document](https://docs.google.com/document/d/1KtMUDz1O3koihG6eh-QcUqudZjNAX3NsqqOMYh3BoWA/edit?usp=sharing).

The goal is to make it easier to debug scaling behavior by exposing structured logs, metrics, and detailed CLI outputs (`serve status -v`).

This work depends on the ongoing implementation of the Serve custom autoscaler (deployment-level, application-level, and external scaler).

Each observability feature builds on top of the corresponding autoscaler logic, so the sub-issues should be tackled in order: Skeleton -> Deployment -> Application -> External -> Docs.

### Sub-issues

- [ ] 1. [WIP] #55834
  - [x] Backend API PR: #55919
  - [ ] Cli
- [x] 2. Integrate Deployment-level autoscaling metrics and decision history PR: #56225
- [ ] 3. [WIP] Support Application-level custom policy observability PR: #59995
- [ ] 4. Add External scaler observability
- [ ] 5. Update docs with examples of serve status -v outputs, error cases, and troubleshooting

### Use case

The current `serve status` command only shows basic information such as replica counts and health.
As custom autoscaling (deployment-level, application-level, external scalers) becomes available, users need more detailed visibility to understand why scaling decisions are made.

`serve status -v` will let users:
- See scaling decisions and the policies/metrics that triggered them.
- Check metrics freshness (normal vs. delayed).
- Understand errors or abnormal events during autoscaler operation.
- Track application-level scaling when multiple deployments scale together.
- Debug external scaler behavior, e.g. webhook response codes and delivery history.

This extended visibility is essential for debugging complex autoscaling behavior and building confidence in custom scaling logic.


### Example Output (from RFC)
```bash
$ serve status -v

Example 1: Deployment using Default Autoscaling Policy (queue-length based)

======== Serve Autoscaler status: 2025-08-19T15:05:30Z ========
Deployment status
---------------------------------------------------------------
deployment_default_policy:
    Current replicas: 3
    Target replicas: 5
    Replicas allowed: min=1, max=10
    Scaling status: scaling up
    Scaling decisions:
        2025-08-19T14:00:00Z - scaled down from 5 -> 3 (low traffic)
        2025-08-19T15:05:00Z - scaled up from 3 -> 5 (12 requests queued)
    Policy: Default (queue-length based)
    Metrics (look_back_period_s=30):
        queued_requests: 12
    Metric collection: delayed (last update 30s ago)
    Errors: (none)


Example 2: Deployment using a Custom Autoscaling Policy (latency-based)

======== Serve Autoscaler status: 2025-08-19T12:10:00Z ========
Deployment status
---------------------------------------------------------------
deployment_custom_latency_policy:
    Current replicas: 8
    Target replicas: 8
    Replicas allowed: min=1, max=20
    Scaling status: stable
    Scaling decisions:
        2025-08-19T11:30:00Z - scaled up from 2 -> 4 (cpu_usage_percent 85% > 80%)
        2025-08-19T11:50:00Z - scaled up from 4 -> 8 (latency_p95_ms 450ms > 300ms)
    Policy: Custom (my_custom_policy)
    Metrics (look_back_period_s=60):
        latency_p95_ms: 450.0
        cpu_usage_percent: 62.5
    Metric collection: healthy (last update 5s ago)
    Errors:
        2025-08-19T12:05:00Z - PolicyError: Exception in user policy (ZeroDivisionError) – scaling skipped


Example 3: Deployment using an External Webhook Scaler

======== Serve Autoscaler status: 2025-08-19T04:12:00Z ========
Deployment status
---------------------------------------------------------------
deployment_webhook_policy:
    Current replicas: 5
    Target replicas: 3
    Replicas allowed: min=0, max=10
    Scaling status: scaling down
    Scaling decisions:
        2025-08-19T03:59:00Z - scaled up from 3 -> 5 (external scaler: cpu_usage_percent 92% > 90%)
        2025-08-19T04:10:00Z - scaled down from 5 -> 3 (external scaler: cpu_usage_percent 5% < 10%)
    Policy: External (external scaler)
    Metrics: n/a (decisions made externally) 
    Metric collection: healthy (last update 2s ago)
    Webhook history:
        2025-08-19T03:59:01Z - scale up to 5 replicas (200 OK)
        2025-08-19T04:10:01Z - scale down to 3 replicas (500 ERROR)
    Errors: (none)

Example 4: Application using a Custom Application-Level Policy

======== Serve Autoscaler status: 2025-08-20T10:00:00Z ========
Application status
---------------------------------------------------------------
application_default_policy:
    Scaling status: scaling up
    Policy: Custom (example_application_policy)
    Scaling decisions:
        2025-08-20T09:55:00Z - scaled up frontend: 2 -> 4, backend: 4 -> 6 (total_requests=200)
    Metrics (look_back_period_s=45):
        total_requests: 200
    Errors: (none)

Deployments:
    frontend:
        Current replicas: 4
        Target replicas: 4
        Replicas allowed: min=1, max=10
    backend:
        Current replicas: 6
        Target replicas: 6
        Replicas allowed: min=2, max=20
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Umbrella][serve] Advanced Observability for Serve Autoscaler #55833

Description

Sub-issues

Use case

Example Output (from RFC)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Umbrella][serve] Advanced Observability for Serve Autoscaler #55833

Description

Description

Sub-issues

Use case

Example Output (from RFC)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions