Skip to content
This repository was archived by the owner on May 26, 2026. It is now read-only.

feat(KR-P2-HEALTH-PANEL): health rollup admin UI (R4.1 §9.7 subsignals) against stub endpoint#54

Merged
rafe-walker merged 1 commit into
mainfrom
feat/kora-KR-P2-HEALTH-PANEL
May 21, 2026
Merged

feat(KR-P2-HEALTH-PANEL): health rollup admin UI (R4.1 §9.7 subsignals) against stub endpoint#54
rafe-walker merged 1 commit into
mainfrom
feat/kora-KR-P2-HEALTH-PANEL

Conversation

@rafe-walker

Copy link
Copy Markdown
Owner

Summary

Read-only operator-side aggregator for "is Kora healthy across all measurable axes". Pairs with future KR-P2-L (kora.health.probe emitter + subsignal collectors); v1 returns an all-fresh hardcoded stub so the panel ships ahead of runtime wire-in.

7th admin panel in the proven stub-first pattern (predecessors: #26 OPS, #31 SEA, #37 CONTROL, #45 BOOT, #49 COST, #52 CAP).

Endpoint (kora_cli/web_server.py)

GET /api/health-rollup — returns:

field semantics
overall / control_plane / worker health enum per axis (R4.1 §9.7: "distinguish control plane healthy from worker healthy")
stopped_reason non-null only when overall ∈ {stopped, outage} — lets operators tell "intentionally stopped" from "outage" at a glance
subsignals all 8 R4.1 §9.7 axes (see below); each carries a status enum + axis-specific fields
stub true (flips with KR-P2-L)

Enums — top-level: healthy | degraded | stopped | outage; per-subsignal: fresh | stale | missing | degraded.

8 R4.1 §9.7 subsignals:
last_successful_write, claim_state, credit_burn, breaker_state, auth_validity_window, dispatch_reachable, last_heartbeat, escalation_watcher_liveness

No POST/PUT/DELETE — alerting integration is cockpit-side (IsoKron-team's lane).

Frontend (web/)

  • pages/HealthRollupPage.tsx
    • P6 banner (R4.1 §9.7 P6) — red top-of-page banner appears only when escalation_watcher_liveness.status === "stale"; explicit "use manual L4" instruction so operator knows automated escalation is unavailable. Rendered above every other element on the page so it's the first thing the operator sees during a real outage.
    • Overall health card — large uppercase status, tone-coded icon (CheckCircle/AlertTriangle/PauseCircle/AlertOctagon), control_plane + worker pills side-by-side, stopped_reason surfaced when present. Inline warning when overall is stopped/outage with no reason reported.
    • Subsignals grid — 8 cards in R4.1 §9.7 reading order (not alphabetical — operators learn the layout; re-ordering would shift visual cognitive load). Each card shows axis name + status icon + status pill + axis-specific detail rows (value / threshold / elapsed / remaining derived from whichever fields the subsignal carries). Defensive fallback "missing" card if the runtime ever omits a canonical subsignal.
  • lib/api.ts — 2 type aliases (HealthStatus + SubsignalStatus) + 2 interfaces (Subsignal as open-shape union covering different field combos per R4.1 §9.7; HealthRollupResponse).
  • App.tsx/health-rollup route + nav entry (HeartPulse icon) between /operational-state and /boot-status.

Flip-over plan

When KR-P2-L lands and HealthRollupHolder.current() exists, swap get_health_rollup body to project from that and drop the stub flag. The page itself doesn't change.

Test plan

  • tests/kora_cli/test_web_server_health_rollup.py — 9/9 green, covers all 8 §4 scenarios + contract guards (8-subsignal contract pinned; stopped_reason non-empty when set; escalation_watcher_liveness.status field always present so the P6 banner can't be silently suppressed by schema drift)
  • test_web_server_{capabilities,cost_state,boot_status,kora_control,sea_tickets,cron_profiles,host_header,mcp,gateway_identity}.py — 90/90 still green (99 total)
  • npx tsc -b on web/ — clean
  • npx vite build on web/ — clean
  • Manual smoke: navigate to /health-rollup, verify STUB banner renders, verify all 8 subsignal cards render in R4.1 reading order with fresh status, verify P6 banner does NOT show (current stub has escalation_watcher_liveness fresh); manual-swap the stub to status: "stale" to verify the P6 red banner appears at the top of page

Dependency notes

  • KR-P2-L (future CC#1 bucket) builds kora.health.probe synthetic + 8 subsignal collectors
  • HealthRollupHolder.current() is the accessor that flips this from stub to real

🤖 Generated with Claude Code

…s) against stub endpoint

Read-only operator-side aggregator for "is Kora healthy across all
measurable axes". Pairs with future KR-P2-L (kora.health.probe
emitter + subsignal collectors); v1 returns an all-fresh hardcoded
stub so the panel ships ahead of runtime wire-in. 7th admin panel
in the proven stub-first pattern.

Backend (kora_cli/web_server.py):
  GET /api/health-rollup — read-only stub. Returns:
    * overall / control_plane / worker — separate health enums
      (R4.1 §9.7: "distinguish control plane healthy from worker
      healthy"). Enum: healthy | degraded | stopped | outage.
    * stopped_reason — non-null only when overall ∈ {stopped, outage};
      lets operators tell "intentionally stopped" from "outage" at
      a glance without guessing.
    * subsignals — all 8 R4.1 §9.7 axes:
        last_successful_write, claim_state, credit_burn,
        breaker_state, auth_validity_window, dispatch_reachable,
        last_heartbeat, escalation_watcher_liveness
      Per-subsignal status: fresh | stale | missing | degraded.
  No POST/PUT/DELETE; alerting integration is cockpit-side.

Frontend:
  - HealthRollupPage.tsx —
    * **P6 banner** (R4.1 §9.7 P6): red top-of-page banner appears
      ONLY when escalation_watcher_liveness.status == "stale";
      explicit "use manual L4" instruction so operator knows
      automated escalation is unavailable. Surfaced above every
      other element on the page so it's the first thing the
      operator sees during a real outage.
    * Overall health card: large uppercase status, tone-coded
      icon (CheckCircle/AlertTriangle/PauseCircle/AlertOctagon),
      control_plane + worker pills side-by-side, stopped_reason
      surfaced when present. Inline warning when overall is
      stopped/outage with no reason.
    * Subsignals grid: 8 cards in R4.1 §9.7 reading order (not
      alphabetical — pinned because operators learn the layout
      and re-ordering on alpha would shift visual cognitive load).
      Each card: status icon + label + status pill + per-subsignal
      detail rows (value/threshold/elapsed/remaining derived from
      whichever fields the subsignal carries). Defensive fallback
      "missing" card if the runtime ever omits a canonical
      subsignal.
  - api.ts — 2 type aliases (HealthStatus + SubsignalStatus) + 2
    interfaces (Subsignal as open-shape union covering all the
    different field combinations per R4.1 §9.7; HealthRollupResponse).
  - App.tsx — /health-rollup route + nav entry (HeartPulse icon)
    between /operational-state and /boot-status.

Tests: tests/kora_cli/test_web_server_health_rollup.py — 9 tests
covering all 8 §4 scenarios + 1 contract guard (stopped_reason
non-empty when set). Pinned the 8-subsignal contract so a future
runtime flip dropping one surfaces at the endpoint layer rather
than as a silent "missing" fallback. 99/99 across the full
admin-panel test suite pass. tsc -b + vite build clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant