This repository was archived by the owner on May 26, 2026. It is now read-only.
feat(kora): KR-CC1-POLISH-AND-STABILITY-MEGABUCKET — test stability + promote CLI + alert fallback + envelope auto-approve#201
Merged
rafe-walker merged 1 commit intoMay 24, 2026
Conversation
… promote CLI + alert fallback + envelope auto-approve
Deliverable A — Wider-suite test stability:
* Audited missing-deps blocker: all the deps CC#1 has been
flagging across the last several PRs (aiosmtplib, prompt_toolkit,
aioimaplib, fire, openai) are ALREADY declared in
``pyproject.toml`` core / dev extras. The gap was env-side, not
code-side — CC#1's dev environment hadn't been re-synced after
the dep additions across recent buckets.
* Resolution: ``uv pip install -e ".[all,dev]"`` (already
documented in CONTRIBUTING.md). With the deps installed:
- Collection: 7008 tests (was 7004 collected + 4 collection
errors on un-synced env)
- Full suite under hermetic env: **6905 passed / 94 failed /
10 skipped** in 83s
- Sub-suite excluding reasoning + gateway: **6605 passed /
24 failed / 10 skipped** in 80s
* Note: the 94 remaining failures are PRE-EXISTING (verified by
``git stash`` + re-run on the merge commit before this PR).
They cluster in:
- ``reasoning/test_anthropic_engine*.py`` (~53) — gateway-route-
through mock isolation issue introduced by CC#3's #196
daemon Phase 1 default-flip; tests call real Anthropic API
via the route-through path even though they pass a mock
client.
- ``test_backup.py`` / ``test_config.py`` / ``test_cron.py`` /
``test_web_server.py`` (~30) — HERMES_HOME → KORA_HOME
migration tests stamping legacy expectations.
- ``test_kanban*.py`` (~4) — separate flaky area.
Per the bucket spec STOP-ASK §4: these need a separate
stabilization bucket; this one completes with the deps-side
resolved. PR description carries the recommended bucket title.
Deliverable B — ``kora promote`` operator CLI commands:
* New module ``kora_cli/promote_cli.py`` with 4 subcommands:
- ``kora promote status`` — per-loop pending / approved /
rejected / expired counts + last activity timestamp across
all 6 loops.
- ``kora promote run-once <loop>`` — invoke one cycle ad-hoc;
returns the loop's cycle summary dict.
- ``kora promote history <loop> [--days N]`` — recent audit
rows for the loop, filtered to the per-loop seam vocabulary
and (where the seam is shared like ``promotion.approved``)
scoped via the ``promotion:<loop>:`` caller_session_id prefix.
- ``kora promote pending <loop>`` — JSON dump of currently-
pending proposals; ordered highest-confidence first.
* Snapshot-expand's audit-only layout (no pending/approved/
rejected) gets special-cased — ``pending`` errors with a clear
redirect to ``history``; ``status`` surfaces applied-record
counts only.
* Registered under main.py's existing subparser pattern; cycle
imports are LAZY so ``kora promote --help`` doesn't pull the
clustering / pricing chain.
Deliverable C — Alert investigation fallback DM polish:
* The mechanism shipped in #197 (fallback DM on reasoning failure
via ``append_outbound_log_entry`` + dm_status=
``engine_unavailable_fallback`` in the audit). KR-CC1-POLISH
improves the wording:
- Header now surfaces ``category (severity): alert id N
(via {channel})``
- Footer is explicit: "Kora is unavailable to investigate this
alert ... Review the alerts panel and act manually — Kora
will not retry this investigation."
* New test asserts dm_status exactly ==
``"engine_unavailable_fallback"`` on the engine-None path; CC#2's
KR-FE-ALERT-INVESTIGATIONS-VIEWER reads that enum value.
Deliverable D — Probe-fix-envelope auto-approve (low-risk):
* New env: ``KORA_PROMOTE_PROBE_FIX_AUTO_APPROVE_LOW_RISK``
(default ``false``; operator opts in).
* New env: ``KORA_PROMOTE_PROBE_FIX_AUTO_APPROVE_WAIT_HOURS``
(default ``1.0``).
* New module ``kora_cli/promote/probe_fix_envelopes/auto_approve.py``
with ``run_auto_approve_sweep`` — runs at end of each
probe-fix-envelope cycle, walks pending proposals, auto-
approves those whose ``blast_radius_level == "low"`` AND have
been pending ≥ wait_hours. Approval = transition to
``approved/`` + emit ``promotion.probe_envelope_action_auto_approved``
audit row.
* New ``ProbeEnvelopeProposal.blast_radius_level`` field
(``"low" | "medium" | "high"``); default ``"high"`` preserves
the existing operator-must-review posture. Backwards-compat:
legacy on-disk payloads without the field rehydrate to
``"high"`` via the new ``proposal_from_dict`` helper.
* Heuristic in ``_derive_blast_radius_level`` returns ``"low"``
ONLY for known-narrow envelope patterns
(``_KNOWN_LOW_RISK_PATTERNS`` — currently the fly
restart_unhealthy_machine envelope's (probe, issue_category)
pairs). Everything else defaults to ``"high"``. The heuristic
intentionally undershoots — false-low classifications would
let proposals slip through to operator's envelope without
review.
* CRITICAL — two-tier gating preserved (documented inline +
in the auto_approve module docstring):
1. Auto-approve → "this is in our envelope vocabulary"
2. Per-probe ``KORA_PROBE_AUTOFIX_<NAME>_ENABLED`` →
"Kora is permitted to invoke it at runtime"
The auto-approve flag DOES NOT cause Kora to execute the
fix — only adds it to the vocabulary; operator still enables
the per-probe ENABLED env separately to authorize execution.
* New audit seam ``promotion.probe_envelope_action_auto_approved``
extending SeamName Literal with the auto_approve_wait_hours +
auto_approved_at fields for operator timeline reconstruction.
Tests:
* 12 new tests for the probe-fix-envelope auto-approve sweep
(heuristic / fixture-backed rehydration / sweep disabled by
default / wait-window enforcement / high-risk never approves /
audit payload shape).
* 13 new tests for ``kora promote`` CLI commands (status / pending
/ history / run-once dispatch / loop allowlist / snapshot_expand
special-case error / cross-loop history caller_session_id
filtering).
* 2 new tests for alert fallback DM polish (wording assertion +
dm_status enum assertion).
* All 41 new tests pass; all relevant existing tests still pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
25dfdba to
58c5a61
Compare
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
CC#1 pivot to polish + stability after 10 substantial product buckets. Per
[[feedback-local-first-upstream-after]]: pre-prod system needs battle-testing before externalization, and polish IS the battle-testing.Per-deliverable status
pyproject.tomlhad aiosmtplib / prompt_toolkit / aioimaplib / fire / openai already declared. Collection now 7008/7008 (was 7004 + 4 errors). The 94 remaining failures are pre-existing — split into a separate bucket (recommended below).kora promoteoperator CLIkora_cli/promote_cli.py; lazy cycle imports keep--helpfast.dm_status="engine_unavailable_fallback"assertion test.blast_radius_levelfield + heuristic. Auto-approves only known-narrow envelopes (currently fly restart_unhealthy_machine), after 1h wait window. Two-tier gating preserved: auto-approve adds to vocabulary; per-probe ENABLED env still required for runtime execution.Full-suite test count (now green-ish)
Run via
env -u ANTHROPIC_API_KEY -u OPENAI_API_KEY -u CLAUDE_CODE_OAUTH_TOKEN -u ANTHROPIC_AUTH_TOKEN TZ=UTC LANG=C.UTF-8 PYTHONHASHSEED=0 python3.11 -m pytest tests/kora_cli -n 4 --timeout=30 --timeout-method=signal:The 94 failures are pre-existing (verified by
git stash+ re-run on the merge commit before this PR — same failures, same count). Breakdown:reasoning/test_anthropic_engine*.pytest_backup.py/test_config.py/test_web_server*.py/test_cron.pyhermes_testpaths,config.yamldefaults).test_kanban*.pyCLI command examples
Probe-fix-envelope auto-approve — sample sequence
CC#1 next-dispatch recommendation
Per the bucket spec's hint: multi-tenant operator-distribution prep. Once CC#3's NousResearch#426 Option C identity-as-plugin lands, the natural CC#1 follow-on is the back-end groundwork for "Kora-per-operator":
Per-tenant cost ladder — extract the singleton
CostStateHolderto a per-tenant accessor;get_cost_holder(tenant_id). Migration story: existing call sites pass an implicit"default"tenant.Per-tenant config isolation —
KORA_HOMEresolution becomes tenant-scoped (${KORA_HOME_ROOT}/${tenant_id}/...). Listener startup pulls per-tenant config; cross-tenant accidental reads returnNonerather than the wrong tenant's state.Per-tenant audit JSONL — extend
emit_auditwith an implicit / explicittenant_idfield; one audit file per tenant; CC#2's cockpit gets a tenant-picker.Pre-existing test stability bucket (recommended title:
KR-TEST-STABILITY-ROUTE-THROUGH-MOCKS-AND-HERMES-HOME-MIGRATION) — fixes the 94 remaining failures so future PRs ship genuinely green. Should land BEFORE the multi-tenant work since multi-tenant adds substantial new test surface.CLI completion + man page — extend the operator surface beyond
kora promote.kora alerts/kora probe/kora auditfor symmetric CLI ergonomics; complement (not replace) the cockpit.The pre-existing-test-stability bucket (4) is the highest-priority of these — every CC#1 PR for the next 5 buckets will have to navigate this same "green except for the pre-existing 94" caveat until it's resolved.
Test plan
kora promoteCLI commandspython3.11 -c 'sys.argv=...; main()'— JSON output validates withjq🤖 Generated with Claude Code