Skip to content
This repository was archived by the owner on May 26, 2026. It is now read-only.

feat(KR-P2-DR-PANEL): DR / substrate-epoch admin UI (R4.1 §9.8) against stub endpoint#58

Merged
rafe-walker merged 1 commit into
mainfrom
feat/kora-KR-P2-DR-PANEL
May 21, 2026
Merged

feat(KR-P2-DR-PANEL): DR / substrate-epoch admin UI (R4.1 §9.8) against stub endpoint#58
rafe-walker merged 1 commit into
mainfrom
feat/kora-KR-P2-DR-PANEL

Conversation

@rafe-walker

Copy link
Copy Markdown
Owner

Summary

Operator-side view of disaster recovery + substrate epoch state. What Joshua checks when a PITR happens — confirms gate 3b fired, the epoch mismatch was detected, Kora paused safely, and walks the operator through the post-PITR `substrate_epoch` bump runbook.

Operationally critical when DR happens; lives quietly when it doesn't — the red DR alert is suppressed in the default clean-state render so the panel is unobtrusive in normal operation. Pairs with CC#1's in-flight KR-P2-M (DR/epoch consumer).

8th admin panel in the proven stub-first pattern (predecessors: #26 OPS, #31 SEA, #37 CONTROL, #45 BOOT, #49 COST, #52 CAP, #54 HEALTH).

Endpoint (kora_cli/web_server.py)

`GET /api/dr-state` — read-only stub. Returns:

field semantics
`current` substrate_epoch + kora_known_epoch + match_status + last_check_at + kora_paused_substrate
`epoch_history` every advancement of Kora's known epoch with source (boot-success / dr-recovery / operator-bump)
`recent_dr_events` `kora.dr.observed` payloads with epoch jump + discard counts + cleared-by
`runbook_pending` derived flag; FE renders red top-of-page alert when True
`stub` true

Enums: `match_status` ∈ `{clean, mismatch_detected, pending_runbook, unknown}`; `source` ∈ `{boot-success, dr-recovery, operator-bump}`.

No POST/PUT/DELETE — the post-PITR `substrate_epoch` bump is an OS-level operator action (Fly secret + `flyctl restart`). This panel SURFACES the need + points at the runbook docs; it does not execute the runbook.

Frontend (web/)

  • `pages/DRStatePage.tsx`
    • DR alert banner — red top-of-page, ONLY renders when `runbook_pending: true`. Spells out the post-PITR steps and links to `kora_docs/15_status_and_roadmap/dr_runbook.md` (pending KR-P2-M). Default render (clean state) hides the banner entirely.
    • Current epoch card — side-by-side big-number readout: `substrate_epoch` (green when match, red when substrate advanced) + `kora_known_epoch` (green match, amber catching-up). Match-status pill with tone + `PAUSED{substrate}` pill when applicable + last-check timestamp.
    • Epoch history table — sorted epoch desc; per-entry source badge + Kora's catch-up timestamp ("not yet" when source registered but Kora hasn't caught up).
    • Recent DR events table — collapsible; default-expanded when `runbook_pending` OR epoch history shows a non-monotonic jump (a prior DR happened). Shows epoch jump, discard counts, cleared-by status with check/X icon.
  • `lib/api.ts` — `DRMatchStatus` + `EpochSource` type aliases + 4 interfaces (`DRCurrent`, `EpochHistoryEntry`, `DRObservedEvent`, `DRStateResponse`).
  • `App.tsx` — `/dr-state` route + nav entry (ShieldAlert icon) between `/boot-status` and `/cost-state`.

Flip-over plan

When KR-P2-M lands and `DREpochState.current()` + `.recent_dr_events()` accessors exist, swap `get_dr_state` body to project from those and drop the `stub` flag. The page itself doesn't change.

Test plan

  • `tests/kora_cli/test_web_server_dr_state.py` — 8/8 green, covers all 7 §4 scenarios + extra invariant guards:
    • `runbook_pending` ⇒ `match_status != "clean"` (avoid false DR alert on healthy systems)
    • `kora_paused_substrate` ⇒ `match_status ∈ {mismatch_detected, pending_runbook}` (don't hold the pause without an active mismatch)
    • epoch sequence monotonic when sorted by `observed_at` (PITR can jump by >1 but never decrease)
    • DR event `to_epoch >= from_epoch`
    • DR event `cleared_at` and `cleared_by` always paired (both null or both set)
  • `test_web_server_{health_rollup,capabilities,cost_state,boot_status,kora_control,sea_tickets,cron_profiles,host_header,mcp,gateway_identity}.py` — 99/99 still green (107 total across 11 suites)
  • `npx tsc -b` on `web/` — clean
  • `npx vite build` on `web/` — clean
  • Manual smoke: navigate to `/dr-state`, verify default render is QUIET (clean state, no red banner, both epoch readouts green); manual-swap stub to `runbook_pending: true` + `match_status: "mismatch_detected"` + `kora_paused_substrate: true` to verify red DR alert at top + per-field tone changes correctly

Dependency notes

  • KR-P2-M (CC#1, in flight) wires the runtime-side DR consumer
  • `DREpochState.current()` + `.recent_dr_events()` are the accessors that flip this from stub to real

🤖 Generated with Claude Code

…st stub endpoint

Operator-side view of disaster recovery + substrate epoch state.
What Joshua checks when a PITR happens — confirms gate 3b fired,
the epoch mismatch was detected, Kora paused safely, and walks
the operator through the post-PITR substrate_epoch bump runbook.

Operationally critical when DR happens; lives quietly when it
doesn't (the red DR alert is suppressed in the default render so
the panel is unobtrusive in normal operation).

8th admin panel in the proven stub-first pattern.

Backend (kora_cli/web_server.py):
  GET /api/dr-state — read-only stub. Returns:
    * current: substrate_epoch + kora_known_epoch + match_status
      (clean / mismatch_detected / pending_runbook / unknown) +
      last_check_at + kora_paused_substrate
    * epoch_history: every advancement of Kora's known epoch with
      source enum (boot-success / dr-recovery / operator-bump)
    * recent_dr_events: kora.dr.observed payloads with from_epoch
      → to_epoch + discarded_operation_ids + discarded_ledger_rows
      + cleared_at + cleared_by
    * runbook_pending: derived flag; FE renders red top-of-page
      alert when True
  No POST/PUT/DELETE — the post-PITR substrate_epoch bump is an
  OS-level operator action (Fly secret + flyctl restart). This panel
  SURFACES the need + links to docs; it does not execute the
  runbook.

Frontend:
  - DRStatePage.tsx —
    * DR alert banner: red top-of-page, ONLY renders when
      runbook_pending is True. Spells out the post-PITR steps and
      points at kora_docs/15_status_and_roadmap/dr_runbook.md
      (pending KR-P2-M). Default render (clean state) hides the
      banner entirely so the panel is quiet in normal operation.
    * Current epoch card: side-by-side substrate_epoch +
      kora_known_epoch big-number readout (green when match, red
      when substrate has advanced past kora, amber when kora is
      catching up). MatchStatus pill + PAUSED{substrate} pill +
      last-check timestamp.
    * Epoch history table: sorted epoch desc, shows source badge.
    * Recent DR events table: collapsible (default-expanded when
      runbook_pending OR epoch history has a non-monotonic jump
      indicating a prior DR — operator gets context immediately).
      Shows epoch jump, discard counts, cleared-by status.
  - api.ts — DRMatchStatus + EpochSource type aliases + 4
    interfaces (DRCurrent, EpochHistoryEntry, DRObservedEvent,
    DRStateResponse).
  - App.tsx — /dr-state route + nav entry (ShieldAlert icon)
    between /boot-status and /cost-state.

Tests: tests/kora_cli/test_web_server_dr_state.py — 8 tests
covering all 7 §4 scenarios + extra invariant guards:
  * runbook_pending ⇒ match_status != "clean" (avoid false DR
    alert on healthy systems)
  * kora_paused_substrate ⇒ match_status ∈ {mismatch_detected,
    pending_runbook} (don't hold the pause without an active
    mismatch)
  * epoch sequence monotonic when sorted by observed_at (PITR
    can jump by >1 but never decrease)
  * DR event to_epoch >= from_epoch
  * DR event cleared_at and cleared_by always paired
107/107 across the full 11-suite admin-panel test set pass.
tsc -b + vite build clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant