Skip to content

feat(api): Server-Sent Events (SSE) streaming endpoint '/events' and '/snapshot' JSON #193

Description

@inureyes

Summary

Add two HTTP endpoints to all-smi api mode on top of the existing axum server:

  1. GET /events — Server-Sent Events stream. Emits a JSON payload per collection cycle. Targets embedded dashboards and Tauri/Electron apps that want live updates without polling /metrics.
  2. GET /snapshot — One-shot JSON response. Same schema as a single SSE frame. Convenient HTTP pair for the all-smi snapshot CLI.

Both reuse the exact JSON schema shared by the snapshot subcommand and the record format (schema: 1).

Motivation

The current /metrics Prometheus endpoint serves its purpose for time-series scraping, but embedded UIs need lower latency and easier client code. Implementing SSE on the already-present axum stack is cheap (SSE is just text/event-stream with a keep-alive) and broadens the set of integrations (custom React/Vue/Lit dashboards, Tauri desktop apps, Grafana panel plugins that prefer JSON, simple debugging with curl -N). Keeping the JSON schema identical to snapshot and record means one format, three delivery channels.

Current state

  • axum + tower-http in Cargo.toml.
  • src/api/server.rs wires the /metrics route.
  • Collection loop in src/api/mod.rs / src/api/server.rs polls readers every --interval seconds.
  • GpuInfo, CpuInfo, etc. all implement Serialize — no new derives needed.

Proposed design

Endpoints

GET /snapshot

  • Content-Type: application/json.
  • Body: the JSON frame produced by snapshot subcommand (single object with schema, timestamp, gpus, cpus, …).
  • Query params:
    • ?include=gpu,cpu,memory,chassis,process,storage (default gpu,cpu,memory,chassis).
    • ?pretty=1 (default off for HTTP).
  • Caching: Cache-Control: no-store.
  • Response time bound: must complete within one collection cycle + 200 ms.

GET /events

  • Content-Type: text/event-stream.
  • Emits one event per collection cycle:
    event: snapshot
    id: 42
    data: {"schema":1,"timestamp":"...","gpus":[...], "cpus":[...], ...}
    
    
  • Keep-alive: emit : keep-alive\n\n comment every 30 s (matching HTTP2_KEEPALIVE_SECS config).
  • Query params:
    • ?include=... — same semantics as /snapshot.
    • ?throttle=N — emit at most every N seconds (cannot be smaller than the collection interval).
    • ?heartbeat=N — override heartbeat interval.
  • Last-Event-ID header support: if client reconnects with a known ID, we simply resume with the next live frame (no replay from history — all-smi doesn't store history).
  • CORS: respect the existing tower-http::cors configuration; GET + Accept: text/event-stream must be allowed.

Broadcast architecture

  • Single tokio::sync::broadcast::channel<Arc<SnapshotFrame>> with buffer 16.
  • The existing API collection task sends one Arc<SnapshotFrame> per tick.
  • Each SSE client is a receiver. Lagging receivers get the oldest frames dropped (broadcast semantics): in that case, emit an event: lag\ndata: {"dropped": N}\n\n event and continue.
  • Drop of client connection propagates via receiver.next() returning None — release resources.

Schema

Identical to the snapshot CLI JSON schema — must share serialization code. Place a SnapshotFrame struct in src/common/snapshot.rs (or re-export from the snapshot module), both subcommands consume it.

Implementation plan

Files to add / modify:

  • src/api/server.rs — add routes /events and /snapshot; spawn a singleton collection task that broadcasts SnapshotFrames into a broadcast::Sender.
  • New src/api/handlers/events.rs — SSE handler implementing keep-alive, backpressure, lag notification, and the include filter.
  • New src/api/handlers/snapshot.rs — one-shot JSON handler. Reads the last broadcast frame; if stale beyond 2×interval, forces a fresh collection.
  • src/api/mod.rs — export the shared SnapshotFrame state.
  • Shared module src/common/snapshot.rsSnapshotFrame, FrameBuilder::new().with_include(...).build(). Used by snapshot CLI and by these endpoints.
  • Cargo.tomlaxum feature sse if not already enabled; tokio-stream if needed.
  • Example client in examples/sse_client.html — small HTML+JS using EventSource.

Acceptance criteria

  • curl -N http://localhost:9090/events streams JSON events at the configured interval.
  • curl -N 'http://localhost:9090/events?include=gpu' streams events with only the gpus key populated.
  • curl http://localhost:9090/snapshot returns a single JSON object matching the snapshot schema.
  • curl 'http://localhost:9090/snapshot?include=cpu,memory' returns a subset of keys.
  • Keep-alive comments are emitted every ~30 s when no data has changed (very unlikely at normal intervals, but for edge cases).
  • 50+ concurrent SSE clients do not slow the collection loop (collection tick jitter stays within ±20 ms; measure with a synthetic test).
  • Slow client: the server does not buffer unbounded memory. Lagging receivers see a lag event and resume.
  • Disconnecting a client releases its resources immediately (verified via lsof or tokio-console).
  • Unix Domain Socket transport (already supported for /metrics) also works for /events and /snapshot.
  • examples/sse_client.html opens in a browser and shows live updates.
  • cargo test integration test spins up an api server on a random port, connects an EventSource client, asserts at least 3 frames within 5 seconds.
  • README gains a "Streaming (SSE)" subsection under the API section; API.md documents both endpoints.

Edge cases & non-goals

  • SSE is HTTP/1.1 streaming. Reverse proxies (nginx, haproxy) may buffer — document that users should set X-Accel-Buffering: no / proxy_buffering off. Emit the header from our side (X-Accel-Buffering: no).
  • Heartbeat interval must be smaller than typical proxy idle timeouts (default 30 s should be safe; allow override).
  • The processes section is expensive — only populate when requested via ?include=process to preserve /events default cheapness.
  • Non-goal: WebSocket transport. SSE is simpler, one-way, and sufficient.
  • Non-goal: HTTP/2 server push. SSE works fine over HTTP/1.1 and 2.
  • Non-goal: historical replay of missed frames. Clients missing a frame get the next live one.

Soft dependency

  • Shares SnapshotFrame with the snapshot CLI (dependency) and the record format. Land the shared schema first; ship these three features on a common foundation.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions