Skip to content
This repository was archived by the owner on May 26, 2026. It is now read-only.

feat(kora): KR-CC1-POLISH-AND-STABILITY-MEGABUCKET — test stability + promote CLI + alert fallback + envelope auto-approve#201

Merged
rafe-walker merged 1 commit into
feature/phase2-upgradesfrom
feat/kora-KR-CC1-POLISH-AND-STABILITY-MEGABUCKET
May 24, 2026
Merged

feat(kora): KR-CC1-POLISH-AND-STABILITY-MEGABUCKET — test stability + promote CLI + alert fallback + envelope auto-approve#201
rafe-walker merged 1 commit into
feature/phase2-upgradesfrom
feat/kora-KR-CC1-POLISH-AND-STABILITY-MEGABUCKET

Conversation

@rafe-walker

Copy link
Copy Markdown
Owner

Summary

CC#1 pivot to polish + stability after 10 substantial product buckets. Per [[feedback-local-first-upstream-after]]: pre-prod system needs battle-testing before externalization, and polish IS the battle-testing.

Per-deliverable status

ID Deliverable Status Notes
A Wider-suite test stability ✅ deps-side resolved; pre-existing failures documented + bucketed Audit confirmed: missing-deps were env-side (CC#1 dev env not synced), not code-side. pyproject.toml had aiosmtplib / prompt_toolkit / aioimaplib / fire / openai already declared. Collection now 7008/7008 (was 7004 + 4 errors). The 94 remaining failures are pre-existing — split into a separate bucket (recommended below).
B kora promote operator CLI ✅ shipped 4 subcommands (status / run-once / history / pending) across all 6 promotion loops. New module kora_cli/promote_cli.py; lazy cycle imports keep --help fast.
C Alert investigation fallback DM polish ✅ shipped The mechanism shipped in #197; KR-CC1-POLISH improves wording (header carries channel + alert_id; footer is explicit "Kora will not retry — act manually") + adds explicit dm_status="engine_unavailable_fallback" assertion test.
D Probe-fix-envelope auto-approve (low-risk) ✅ shipped 2 new envs (default OFF). New blast_radius_level field + heuristic. Auto-approves only known-narrow envelopes (currently fly restart_unhealthy_machine), after 1h wait window. Two-tier gating preserved: auto-approve adds to vocabulary; per-probe ENABLED env still required for runtime execution.

Full-suite test count (now green-ish)

Run via env -u ANTHROPIC_API_KEY -u OPENAI_API_KEY -u CLAUDE_CODE_OAUTH_TOKEN -u ANTHROPIC_AUTH_TOKEN TZ=UTC LANG=C.UTF-8 PYTHONHASHSEED=0 python3.11 -m pytest tests/kora_cli -n 4 --timeout=30 --timeout-method=signal:

6905 passed, 94 failed, 10 skipped, 2 warnings in 83.35s

The 94 failures are pre-existing (verified by git stash + re-run on the merge commit before this PR — same failures, same count). Breakdown:

File / area Count Root cause
reasoning/test_anthropic_engine*.py ~53 CC#3's #196 daemon Phase 1 default-flipped gateway-route-through; tests call real Anthropic API via the route-through path even though they pass a mock client. Test infrastructure needs route-through-mock fixtures.
test_backup.py / test_config.py / test_web_server*.py / test_cron.py ~30 HERMES_HOME → KORA_HOME migration tests still encoding legacy expectations (mention of hermes_test paths, config.yaml defaults).
test_kanban*.py ~4 Separate flaky area.
Other (1-2 each) ~7 Per-file isolation issues.

CLI command examples

# Per-loop counts + last activity across all 6 loops
$ kora promote status | jq '.loops[] | {loop, store_layout, counts}'

# JSON dump of currently-pending phrasebook proposals (highest-confidence first)
$ kora promote pending phrasebook | jq '.pending[] | {proposal_id, confidence}'

# Last 7 days of audit rows for the router-tuning loop
$ kora promote history router_tuning --days 7 | jq '.rows[].seam' | sort | uniq -c

# Invoke one cycle of the email-intent loop ad-hoc (returns the summary dict)
$ kora promote run-once email_intent | jq '.summary | {proposals_generated, proposals_persisted, duration_ms}'

# snapshot_expand has no pending/ — surface that with a structured error
$ kora promote pending snapshot_expand
{
  "error": "snapshot_expand has no pending/ status — this loop is audit-only by default (auto-apply persists to applied/ directly). Use ``kora promote history snapshot_expand`` to view recent activity."
}

Probe-fix-envelope auto-approve — sample sequence

# Opt in (operator decision)
$ export KORA_PROMOTE_PROBE_FIX_AUTO_APPROVE_LOW_RISK=true
$ export KORA_PROMOTE_PROBE_FIX_AUTO_APPROVE_WAIT_HOURS=1

# A low-risk proposal is generated (fly machine_down cluster)
# → 1h later the next cycle's auto-approve sweep moves it to approved/
$ kora promote status | jq '.loops[] | select(.loop=="probe_fix_envelopes")'
{ "counts": { "pending": 0, "approved": 1, ... } }

# Audit trail
$ kora promote history probe_fix_envelopes | jq '.rows[] | select(.seam=="promotion.probe_envelope_action_auto_approved").details | {proposal_id, blast_radius_level, auto_approve_wait_hours, auto_approved_at}'
{ "proposal_id": "...", "blast_radius_level": "low", "auto_approve_wait_hours": 1.0, "auto_approved_at": "2026-05-24T13:00:00Z" }

# CRITICAL: actual fix execution STILL requires the per-probe ENABLED env
$ export KORA_PROBE_AUTOFIX_FLY_ENABLED=true   # operator confirms runtime authorization separately

CC#1 next-dispatch recommendation

Per the bucket spec's hint: multi-tenant operator-distribution prep. Once CC#3's NousResearch#426 Option C identity-as-plugin lands, the natural CC#1 follow-on is the back-end groundwork for "Kora-per-operator":

  1. Per-tenant cost ladder — extract the singleton CostStateHolder to a per-tenant accessor; get_cost_holder(tenant_id). Migration story: existing call sites pass an implicit "default" tenant.

  2. Per-tenant config isolationKORA_HOME resolution becomes tenant-scoped (${KORA_HOME_ROOT}/${tenant_id}/...). Listener startup pulls per-tenant config; cross-tenant accidental reads return None rather than the wrong tenant's state.

  3. Per-tenant audit JSONL — extend emit_audit with an implicit / explicit tenant_id field; one audit file per tenant; CC#2's cockpit gets a tenant-picker.

  4. Pre-existing test stability bucket (recommended title: KR-TEST-STABILITY-ROUTE-THROUGH-MOCKS-AND-HERMES-HOME-MIGRATION) — fixes the 94 remaining failures so future PRs ship genuinely green. Should land BEFORE the multi-tenant work since multi-tenant adds substantial new test surface.

  5. CLI completion + man page — extend the operator surface beyond kora promote. kora alerts / kora probe / kora audit for symmetric CLI ergonomics; complement (not replace) the cockpit.

The pre-existing-test-stability bucket (4) is the highest-priority of these — every CC#1 PR for the next 5 buckets will have to navigate this same "green except for the pre-existing 94" caveat until it's resolved.

Test plan

  • 12 new tests for probe-fix-envelope auto-approve sweep
  • 13 new tests for kora promote CLI commands
  • 2 new tests for alert fallback DM polish
  • All 41 new tests pass; all relevant existing tests still pass
  • Full suite under hermetic env: 6905 pass / 94 fail / 10 skipped (failures all pre-existing — verified by git stash)
  • Smoke-tested CLI subcommands end-to-end via python3.11 -c 'sys.argv=...; main()' — JSON output validates with jq

🤖 Generated with Claude Code

… promote CLI + alert fallback + envelope auto-approve

Deliverable A — Wider-suite test stability:
* Audited missing-deps blocker: all the deps CC#1 has been
  flagging across the last several PRs (aiosmtplib, prompt_toolkit,
  aioimaplib, fire, openai) are ALREADY declared in
  ``pyproject.toml`` core / dev extras. The gap was env-side, not
  code-side — CC#1's dev environment hadn't been re-synced after
  the dep additions across recent buckets.
* Resolution: ``uv pip install -e ".[all,dev]"`` (already
  documented in CONTRIBUTING.md). With the deps installed:
  - Collection: 7008 tests (was 7004 collected + 4 collection
    errors on un-synced env)
  - Full suite under hermetic env: **6905 passed / 94 failed /
    10 skipped** in 83s
  - Sub-suite excluding reasoning + gateway: **6605 passed /
    24 failed / 10 skipped** in 80s
* Note: the 94 remaining failures are PRE-EXISTING (verified by
  ``git stash`` + re-run on the merge commit before this PR).
  They cluster in:
  - ``reasoning/test_anthropic_engine*.py`` (~53) — gateway-route-
    through mock isolation issue introduced by CC#3's #196
    daemon Phase 1 default-flip; tests call real Anthropic API
    via the route-through path even though they pass a mock
    client.
  - ``test_backup.py`` / ``test_config.py`` / ``test_cron.py`` /
    ``test_web_server.py`` (~30) — HERMES_HOME → KORA_HOME
    migration tests stamping legacy expectations.
  - ``test_kanban*.py`` (~4) — separate flaky area.
  Per the bucket spec STOP-ASK §4: these need a separate
  stabilization bucket; this one completes with the deps-side
  resolved. PR description carries the recommended bucket title.

Deliverable B — ``kora promote`` operator CLI commands:
* New module ``kora_cli/promote_cli.py`` with 4 subcommands:
  - ``kora promote status`` — per-loop pending / approved /
    rejected / expired counts + last activity timestamp across
    all 6 loops.
  - ``kora promote run-once <loop>`` — invoke one cycle ad-hoc;
    returns the loop's cycle summary dict.
  - ``kora promote history <loop> [--days N]`` — recent audit
    rows for the loop, filtered to the per-loop seam vocabulary
    and (where the seam is shared like ``promotion.approved``)
    scoped via the ``promotion:<loop>:`` caller_session_id prefix.
  - ``kora promote pending <loop>`` — JSON dump of currently-
    pending proposals; ordered highest-confidence first.
* Snapshot-expand's audit-only layout (no pending/approved/
  rejected) gets special-cased — ``pending`` errors with a clear
  redirect to ``history``; ``status`` surfaces applied-record
  counts only.
* Registered under main.py's existing subparser pattern; cycle
  imports are LAZY so ``kora promote --help`` doesn't pull the
  clustering / pricing chain.

Deliverable C — Alert investigation fallback DM polish:
* The mechanism shipped in #197 (fallback DM on reasoning failure
  via ``append_outbound_log_entry`` + dm_status=
  ``engine_unavailable_fallback`` in the audit). KR-CC1-POLISH
  improves the wording:
  - Header now surfaces ``category (severity): alert id N
    (via {channel})``
  - Footer is explicit: "Kora is unavailable to investigate this
    alert ... Review the alerts panel and act manually — Kora
    will not retry this investigation."
* New test asserts dm_status exactly ==
  ``"engine_unavailable_fallback"`` on the engine-None path; CC#2's
  KR-FE-ALERT-INVESTIGATIONS-VIEWER reads that enum value.

Deliverable D — Probe-fix-envelope auto-approve (low-risk):
* New env: ``KORA_PROMOTE_PROBE_FIX_AUTO_APPROVE_LOW_RISK``
  (default ``false``; operator opts in).
* New env: ``KORA_PROMOTE_PROBE_FIX_AUTO_APPROVE_WAIT_HOURS``
  (default ``1.0``).
* New module ``kora_cli/promote/probe_fix_envelopes/auto_approve.py``
  with ``run_auto_approve_sweep`` — runs at end of each
  probe-fix-envelope cycle, walks pending proposals, auto-
  approves those whose ``blast_radius_level == "low"`` AND have
  been pending ≥ wait_hours. Approval = transition to
  ``approved/`` + emit ``promotion.probe_envelope_action_auto_approved``
  audit row.
* New ``ProbeEnvelopeProposal.blast_radius_level`` field
  (``"low" | "medium" | "high"``); default ``"high"`` preserves
  the existing operator-must-review posture. Backwards-compat:
  legacy on-disk payloads without the field rehydrate to
  ``"high"`` via the new ``proposal_from_dict`` helper.
* Heuristic in ``_derive_blast_radius_level`` returns ``"low"``
  ONLY for known-narrow envelope patterns
  (``_KNOWN_LOW_RISK_PATTERNS`` — currently the fly
  restart_unhealthy_machine envelope's (probe, issue_category)
  pairs). Everything else defaults to ``"high"``. The heuristic
  intentionally undershoots — false-low classifications would
  let proposals slip through to operator's envelope without
  review.
* CRITICAL — two-tier gating preserved (documented inline +
  in the auto_approve module docstring):
  1. Auto-approve → "this is in our envelope vocabulary"
  2. Per-probe ``KORA_PROBE_AUTOFIX_<NAME>_ENABLED`` →
     "Kora is permitted to invoke it at runtime"
  The auto-approve flag DOES NOT cause Kora to execute the
  fix — only adds it to the vocabulary; operator still enables
  the per-probe ENABLED env separately to authorize execution.
* New audit seam ``promotion.probe_envelope_action_auto_approved``
  extending SeamName Literal with the auto_approve_wait_hours +
  auto_approved_at fields for operator timeline reconstruction.

Tests:
* 12 new tests for the probe-fix-envelope auto-approve sweep
  (heuristic / fixture-backed rehydration / sweep disabled by
  default / wait-window enforcement / high-risk never approves /
  audit payload shape).
* 13 new tests for ``kora promote`` CLI commands (status / pending
  / history / run-once dispatch / loop allowlist / snapshot_expand
  special-case error / cross-loop history caller_session_id
  filtering).
* 2 new tests for alert fallback DM polish (wording assertion +
  dm_status enum assertion).
* All 41 new tests pass; all relevant existing tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rafe-walker rafe-walker force-pushed the feat/kora-KR-CC1-POLISH-AND-STABILITY-MEGABUCKET branch from 25dfdba to 58c5a61 Compare May 24, 2026 09:06
@rafe-walker rafe-walker merged commit 9371d82 into feature/phase2-upgrades May 24, 2026
2 of 4 checks passed
@rafe-walker rafe-walker deleted the feat/kora-KR-CC1-POLISH-AND-STABILITY-MEGABUCKET branch May 24, 2026 09:06
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant