Skip to content
This repository was archived by the owner on May 26, 2026. It is now read-only.

docs(KR-P2-RUNBOOKS-AUTHOR): DR + token-rotation runbooks (R4.1 §12 closure)#86

Merged
rafe-walker merged 1 commit into
mainfrom
feat/kora-KR-P2-RUNBOOKS-AUTHOR
May 22, 2026
Merged

docs(KR-P2-RUNBOOKS-AUTHOR): DR + token-rotation runbooks (R4.1 §12 closure)#86
rafe-walker merged 1 commit into
mainfrom
feat/kora-KR-P2-RUNBOOKS-AUTHOR

Conversation

@rafe-walker

Copy link
Copy Markdown
Owner

Summary

Authors the two operator runbooks the R4.1 §12 readiness checklist
pinned as outstanding. Closes 2 §12 items.

  • kora_docs/15_status_and_roadmap/dr_runbook.md (~162 lines) —
    post-PITR substrate_epoch bump per R4.1 §9.8 P1+P5.
  • kora_docs/15_status_and_roadmap/token_rotation_runbook.md
    (~191 lines) — unified KORA_SERVICE_TOKEN (wsk_*) +
    CLAUDE_CODE_OAUTH_TOKEN rotation per R2 §5.

No code changes; pure docs.

DR runbook structure

  1. When applies — substrate rewound (PITR / restore / explicit DB
    reset); Kora's kora_known_epoch ahead of substrate_epoch.
  2. Symptoms/api/operational-state PAUSED+degradation_reasons:["substrate"],
    /api/dr-state match_status: mismatch_detected, gate 3b fails in
    /api/boot-status, kora.dr.observed event in chain log.
  3. Pre-procedure checks — confirm rewind, confirm operator actor
    class, confirm one Fly machine.
  4. Bump procedure — read substrate_epoch + kora_known_epoch
    via read SECDEFs; loop bump_substrate_epoch('<OPERATOR_UUID>'::uuid)
    until substrate_epoch > kora_known_epoch (each call is +1).
  5. Clear PAUSEDissue_kora_control with p_level=0,
    p_kind='reset' (operator-only — SECDEF rejects actor_kind='kora'
    and rejects level=0 from non-operators).
  6. Verify gate 3b/api/dr-state clean, latest boot READY with
    gate 3b pass, kora.dr.observed fires on the failed boot only
    (audit) and not on the recovered boot.
  7. Failure modes — under-counted bump, re-pause loop (PAUSED-clear
    listener write failed), wrong actor class for reset, higher-level
    open command preempts reset, KoraKnownEpochMonotonicViolation
    escalation (substrate timeline moved backwards beyond
    kora_known_epoch — not a routine PITR).

Token rotation runbook structure

  1. Credentials covered — table with Doppler project, cadence,
    current expiry per token.
  2. When applies — proactive (T-30/14/7/1 day alerts) +
    reactive symptoms per token (wsk_* lapse: gate 4 fail / kora.boot.failed
    / STOPPED; OAuth lapse: 401/403 at inference / /api/health
    worker: degraded / tickets in failed_retryable).
  3. wsk_ procedure* — mint via substrate-team → doppler update
    kora-runtime-substrateflyctl secrets import (preserves rest
    of project's secrets) → restart → verify boot gates 4/6/10 PASS
    via /api/boot-status → revoke OLD only after verified.
  4. OAuth procedureclaude setup-token → doppler update
    kora-runtime-anthropic → flyctl import + restart → verify via
    inference probe (boot gates don't check OAuth; needs a real
    Anthropic call) → OAuth expires automatically (no revoke).
  5. Joshua's preference — explicit-overlap rotation (mint → verify
    → revoke), per user_joshua memory.
  6. T-30/14/7/1 alerts — operator-workstation cron template per
    token.
  7. Failure modes — stale Doppler download, wrong scope on new
    wsk_*, trailing whitespace in OAuth, simultaneous-rotation revert
    via doppler activity + version-pinned restore.

R4.1 §12 closure claim

Two §12 readiness items close with this PR:

  • "DR runbook mandates the post-restore substrate_epoch bump as a
    non-negotiable step, with explicit post-restore verification that
    gate 3b fired on Kora's next boot." → covered by dr_runbook.md.
  • "The two tokens' rotations are documented as one unified runbook (a
    wsk-token lapse while OAuth is valid burns inference with zero
    durable output)." → covered by token_rotation_runbook.md.

The path kora_docs/15_status_and_roadmap/dr_runbook.md matches the
hardcoded reference in web/src/pages/DRStatePage.tsx:377 ("See
kora_docs/15_status_and_roadmap/dr_runbook.md (pending KR-P2-M)").
The pending tag remains in CC#2's surface — once this PR merges, the
panel reference becomes a live link.

No Rule-3 ASKs raised

Investigation phase verified all referenced helpers + SECDEFs exist
on main:

  • agent/dr_handler.py (gate 3b) + agent/dr_writer.py (writers) +
    plugins/memory/isokron/dr_epoch.py (read/write helpers + the
    KoraKnownEpochMonotonicViolation exception class).
  • packages/db/migrations/0100_kora_dr_epoch_substrate.sql (read
    SECDEFs + bump_substrate_epoch).
  • packages/db/migrations/0090_kora_control_secdefs.sql
    (issue_kora_control, operator-only level=0).
  • docs/deploy-fly-io.md (3-Doppler-project layout + OAuth rotation
    reference the new runbook extends).

No gaps; no STOPs.

Test plan

  • Mechanical: gh pr checks passes (docs-only — no CI gates
    should regress).
  • Path verification: DRStatePage.tsx link target resolves once
    this lands.
  • PM review: confirm structure + R4.1 §12 closure claim.

🤖 Generated with Claude Code

…losure)

Authors the two operator runbooks the R4.1 §12 readiness checklist
pinned as outstanding:

- kora_docs/15_status_and_roadmap/dr_runbook.md (~140 lines) — the
  post-PITR substrate_epoch bump procedure per R4.1 §9.8 P1+P5. Walks
  the operator through reading substrate_epoch + kora_known_epoch via
  the read SECDEFs, looping bump_substrate_epoch (operator-only,
  +1/call) until the delta clears, then issuing a level=0 kora_control
  reset to supersede PAUSED{substrate}. Verifies via /api/dr-state +
  /api/boot-status + /api/operational-state + chain log. Documents
  KoraKnownEpochMonotonicViolation escalation as a separate failure
  mode (signals an unbumped backward jump beyond the kora_known_epoch
  checkpoint — not routine).

- kora_docs/15_status_and_roadmap/token_rotation_runbook.md (~190
  lines) — unified KORA_SERVICE_TOKEN (wsk_*, ~quarterly, current
  expiry 2026-08-18) + CLAUDE_CODE_OAUTH_TOKEN (~yearly) procedure per
  R2 §5. Both rotations follow mint→KR-7-smoke→revoke shape, but
  exercise DIFFERENT verification paths (wsk_* hits boot gate 4 + 6 +
  10; OAuth needs an actual inference probe since the boot gates don't
  check it). Includes T-30/14/7/1 day operator cron template + four
  failure-mode pre-mortems. Explicit guard: NEVER rotate both
  simultaneously (R2 §5 blast-radius isolation premise breaks).

DRStatePage.tsx:377 already hardcodes the dr_runbook path
(kora_docs/15_status_and_roadmap/dr_runbook.md — KR-P2-M's pending
reference). This bucket lands the file there.

No code changes; pure docs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rafe-walker rafe-walker merged commit 185bfff into main May 22, 2026
@rafe-walker rafe-walker deleted the feat/kora-KR-P2-RUNBOOKS-AUTHOR branch May 22, 2026 02:39
rafe-walker added a commit that referenced this pull request May 22, 2026
… + truthful 11-source footer (#95)

v1 row 1 (Health hero + 6 cards) untouched. Row 2 adds 4 cards in 4-col lg grid: Capabilities (total_tools + total_caps + escalating-badge for 17 unmapped C2 mirror groups) / Charter (mid-truncated revision_id + rules_hash + loaded_at + 'rules pending' fallback badge) / Recent events (last 5 chain events, stripped kora. prefix for density) / Runbooks (available vs pending counts + most-recently-modified).

isStubbed loosened to LoadStatus<unknown> with property-check (caps + runbooks responses don't carry stub field). ALL_SOURCES array consolidates 11 sources for anyStubbed banner and footer aggregate. Footer reports truthful 11-source count (PM bucket said 13; corrected — dashboard fetches 11 endpoint-backed sources; matches DIAG-BUNDLE's allowlist).

150/151 admin-panel tests pass; 1 stale assertion from PR #86 vendoring dr_runbook.md auto-improving PR #84's 'unauthored' assertion — flagged for trivial follow-up.
rafe-walker added a commit that referenced this pull request May 22, 2026
…on (#97)

PR #86 vendored dr_runbook.md + token_rotation_runbook.md; PR #84's test_get_runbook_content_404_for_unauthored_manifest_entry asserted dr_runbook as unauthored — stale. Swapped to kora_dna (still placeholder) + updated docstring explaining auto-improvement history. 151/151 admin-panel suite back to green.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant