Skip to content

[Serve][Autoscaler] Add Skeleton CLI and Backend API for serve status -v #55834

@nadongjun

Description

@nadongjun

Description

Sub-issue of #55833

Add the skeleton for serve status -v and backend API.

This should define the CLI entrypoint, JSON schema, and text renderer with placeholder values.
Later work will populate these sections with real data.

For reference, implementation will build on:

  • Event summarizer:

    def summary(self) -> List[str]:
    """Generate the aggregated log summary of all added events."""
    with self.lock:
    out = []
    for template, quantity in self.events_by_key.items():
    out.append(template.format(quantity))
    out.extend(self.messages_to_send)
    return out

  • CLI integration:

    @cli.command(
    short_help="Get the current status of all Serve applications on the cluster.",
    help=(
    "Prints status information about all applications on the cluster.\n\n"
    "An application may be:\n\n"
    "- NOT_STARTED: the application does not exist.\n"
    "- DEPLOYING: the deployments in the application are still deploying and "
    "haven't reached the target number of replicas.\n"
    "- RUNNING: all deployments are healthy.\n"
    "- DEPLOY_FAILED: the application failed to deploy or reach a running state.\n"
    "- DELETING: the application is being deleted, and the deployments in the "
    "application are being teared down.\n\n"
    "The deployments within each application may be:\n\n"
    "- HEALTHY: all replicas are acting normally and passing their health checks.\n"
    "- UNHEALTHY: at least one replica is not acting normally and may not be "
    "passing its health check.\n"
    "- UPDATING: the deployment is updating."
    ),
    )
    @click.option(
    "--address",
    "-a",
    default=os.environ.get("RAY_DASHBOARD_ADDRESS", "http://localhost:8265"),
    required=False,
    type=str,
    help=RAY_DASHBOARD_ADDRESS_HELP_STR,
    )
    @click.option(
    "--name",
    "-n",
    default=None,
    required=False,
    type=str,
    help=(
    "Name of an application. If set, this will display only the status of the "
    "specified application."
    ),
    )
    def status(address: str, name: Optional[str]):
    warn_if_agent_address_set()
    serve_details = ServeInstanceDetails(
    **ServeSubmissionClient(address).get_serve_details()
    )
    status = asdict(serve_details._get_status())
    # Ensure multi-line strings in app_status is dumped/printed correctly
    yaml.SafeDumper.add_representer(str, str_presenter)
    if name is None:
    print(
    yaml.safe_dump(
    # Ensure exception traceback in app_status are printed correctly
    process_dict_for_yaml_dump(status),
    default_flow_style=False,
    sort_keys=False,
    ),
    end="",
    )
    else:
    if name not in serve_details.applications:
    cli_logger.error(f'Application "{name}" does not exist.')
    else:
    print(
    yaml.safe_dump(
    # Ensure exception tracebacks in app_status are printed correctly
    process_dict_for_yaml_dump(status["applications"][name]),
    default_flow_style=False,
    sort_keys=False,
    ),
    end="",
    )

Use case

Provides a visible CLI entrypoint early on, so other contributors can hook in deployment/application/external metrics without conflicts.

Metadata

Metadata

Assignees

Labels

P1Issue that should be fixed within a few weekscommunity-backlogenhancementRequest for new feature and/or capabilityserveRay Serve Related Issueusability

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions