Skip to content
This repository was archived by the owner on May 26, 2026. It is now read-only.

feat(kora): KR-PROMOTE-LOOPS-COMPLETION-MEGABUCKET — 3 loops + debounce + audit batching#193

Merged
rafe-walker merged 1 commit into
feature/phase2-upgradesfrom
feat/kora-KR-PROMOTE-LOOPS-COMPLETION-MEGABUCKET
May 24, 2026
Merged

feat(kora): KR-PROMOTE-LOOPS-COMPLETION-MEGABUCKET — 3 loops + debounce + audit batching#193
rafe-walker merged 1 commit into
feature/phase2-upgradesfrom
feat/kora-KR-PROMOTE-LOOPS-COMPLETION-MEGABUCKET

Conversation

@rafe-walker

Copy link
Copy Markdown
Owner

Summary

CC#1 megabucket completing the promotion-loops product direction. Lands the 3 remaining promotion loops (router-tuning / tool-trimming / probe-fix-envelopes), upgrades probe debounce with consecutive-failure buffering (closes the queued #163 follow-on), and ships R3-4 #9 audit batching — the final cheap-substrate item.

After this lands: all 5 promotion loops live; R3-4 complete.

Per-deliverable status

ID Deliverable Status Notes
A KR-PROMOTE-ROUTER-TUNING (loop 3) ✅ shipped Observer reads cost_telemetry rolling_24h; proposer emits tighten_review when escalation_rate ≥ threshold. loosen_review deferred — needs a new audit row for operator /opus overrides. Endpoints + audit seam promotion.router_trigger_proposed.
B KR-PROMOTE-TOOL-TRIMMING (loop 4) ✅ shipped Observer tallies reasoning.tool_called per (route, tool_name); proposer identifies tools any route never invoked. v1 uses union-across-routes as the "available tools" set (documented swap-in seam). Endpoints + audit seam promotion.tool_trim_proposed.
C KR-PROMOTE-PROBE-FIX-ENVELOPES (loop 5) ✅ shipped Observer clusters probe.investigation_completed by (probe, issue_category); proposer emits envelope-action suggestions with conservative "operator must review" blast-radius. Auto-apply HARDCODED FALSE per spec §2. STOP-ASK §4 mitigation: approve endpoint transitions status + emits audit ONLY; operator manually scaffolds approved envelopes into probes/fix_envelopes.py (proposal payload carries the suggested FixEnvelope shape for mechanical copy-paste).
D KR-PROBE-DEBOUNCE consecutive-failure ✅ shipped New env KORA_PROBE_DEBOUNCE_CONSECUTIVE_REQUIRED (default 2). Buffer state (probe, category) → (count, first_seen_at). Critical+bypass still skips. required=1 restores PR #166 behavior.
E KR-CHEAP-AUDIT-BATCHING (R3-4 #9) ✅ shipped Queue + background daemon thread; flush at size (default 100) or time (default 5s); atexit drain on shutdown. Per-emit interface unchanged. BATCH_SIZE=0 preserves legacy sync path (used by the test conftest globally to keep existing emit-then-read suites sync).

Plus a shared store helper at kora_cli/promote/_shared/proposal_store.py so the 3 new loops don't each duplicate the phrasebook store's per-status-dir layout.

Sample proposal JSON — one per new loop

Loop 3 (router-tuning) — promotion.router_trigger_proposed

{
  "proposal_id": "p-router-001",
  "route": "slack_dm",
  "calls_count": 100,
  "escalation_count": 60,
  "escalation_rate": 0.6,
  "cost_estimate_usd_total": 1.2,
  "recommendation_kind": "tighten_review",
  "rationale": "Route 'slack_dm' escalated to Opus on 60/100 calls (60.0%) in the rolling 24h window. Per-call escalations cost a full Opus turn on top of the original Haiku turn. Operator review of the escalation trigger pattern for this route is recommended; spend so far: \$1.2000.",
  "confidence": 1.0,
  "created_at": "2026-05-24T08:00:00Z",
  "status": "pending",
  "review_notes": ""
}

Loop 4 (tool-trimming) — promotion.tool_trim_proposed

{
  "proposal_id": "p-tool-001",
  "route": "email_inbound",
  "unused_tools": ["kora__attempt_probe_autofix", "kora__send_email_to_operator"],
  "total_calls_for_route": 42,
  "observation_window_days": 30,
  "confidence": 1.0,
  "created_at": "2026-05-24T09:00:00Z",
  "status": "pending",
  "review_notes": ""
}

Loop 5 (probe-fix-envelopes) — promotion.probe_envelope_action_proposed

{
  "proposal_id": "p-envelope-001",
  "probe": "fly",
  "issue_category": "machine_down",
  "fix_name_suggestion": "proposed_fly_machine_down",
  "cluster_size": 4,
  "sample_caller_session_ids": ["probe:fly:machine_down:1", "probe:fly:machine_down:2", "probe:fly:machine_down:3"],
  "recurring_recommendation_text": "Restart the fly machine to recover.",
  "blast_radius_summary": "operator must review — proposed envelope action has not been classified for production-mutation risk; treat as broad-impact by default until operator narrows the scope",
  "confidence": 0.6667,
  "created_at": "2026-05-24T10:00:00Z",
  "status": "pending",
  "review_notes": ""
}

Combined daily cost — all 5 promotion loops

Loop Per-cycle cost Daily
1 — phrasebook (#186) ≤$0.005 (Haiku synthesis, ≤5 proposals × ~$0.001) ≤$0.005
2 — snapshot-expand (#190) $0 (fully lexical) $0
3 — router-tuning (this PR) $0 (telemetry counter math) $0
4 — tool-trimming (this PR) $0 (audit-log scan + set diff) $0
5 — probe-fix-envelopes (this PR) $0 (audit-log scan + lexical cluster) $0
Combined ≤$0.005/day

Well inside the [[feedback-promotion-loops-self-improving-subsystems]] $0.01-0.05/day target. The four new loops are all $0 — phrasebook is the only one with any LLM cost, capped at $0.005/day.

STOP-ASK posture

  • ✅ Router-tuning quality-diff data — not available today (documented in observer); v1 ships with the data we DO have (calls + escalation counts) and surfaces "review this route" recommendations rather than auto-tuning. No PM ask needed.
  • ✅ Probe-fix-envelope codegen fragility — mitigated via store-only + manual-scaffold approval path; fix_envelopes.py is NEVER mutated by the loop or its endpoints.
  • ✅ Audit batching lifecycle — background daemon thread starts lazily on first emit; atexit drains on shutdown; works for both daemon (long-lived) and CLI (short-lived) processes.
  • ✅ Daily cost — all 4 new loops are $0; combined daily ceiling still ≤$0.005.

CC#2 follow-on recommendation

CC#2's PromotionReviewPage (delivered alongside #186/#190) handles the phrasebook + snapshot-expand loop shapes. The 3 new loops add 3 distinct proposal shapes that the panel needs to render. Recommended follow-on:

  1. Extend the panel's loop-discriminator to handle the 4 new endpoint groups (/api/promotions/{router-tuning,tool-trimming,probe-envelopes}/...) alongside the existing phrasebook group. A loop-type tab strip is cleaner than nested tabs per-loop.
  2. Per-loop card renderers — each loop's proposal shape needs its own card layout:
    • Router-tuning: route + escalation_rate badge + rationale + Approve/Reject
    • Tool-trimming: route + collapsible list of unused_tools + total_calls + Approve/Reject
    • Probe-envelopes: probe + category + fix_name_suggestion + blast_radius_summary rendered as a prominent warning + recurring_recommendation_text + Approve/Reject. Bigger visual treatment matches the HIGH-RISK posture.
  3. Status enum drift-guard pin — the panel's PROMOTION_STATUS_VALUES constant should pin against _PROMOTION_STATUS_VALUES in kora_cli/web_server.py via a snapshot test; same drift-guard rule the phrasebook panel established.
  4. Pending count rollup — sidebar nav badge can sum pending counts across all 4 endpoint groups so operator sees "8 proposals waiting" without drilling into per-loop tabs.

Recommended bucket title: KR-FE-PROMOTION-REVIEW-PANEL-EXTEND — small to medium bucket; the loop-discriminator + 3 card renderers are mostly mechanical given the existing panel scaffolding.

Test plan

  • 32 new tests across the 3 new promote loops + shared store
  • 6 new tests for KR-PROBE-DEBOUNCE consecutive-failure paths (first-failure-buffered, second-failure-dispatches, critical-bypass, independent-pairs, required=1 backward-compat, reset clears buffer)
  • 6 new tests for KR-CHEAP-AUDIT-BATCHING (size-trigger, time-trigger via background thread, per-path grouping, shutdown-drain, opt-out path)
  • tests/kora_cli/conftest.py autouse fixture defaults to BATCH_SIZE=0 so legacy suites keep sync semantics
  • 582 pass across snapshot / promote / reasoning / probes / audit / hermes plugin suites locally. Remaining failures across the full suite are missing local dev-deps (prompt_toolkit, aiosmtplib, etc.) unrelated to this PR.

🤖 Generated with Claude Code

…ce + audit batching

Deliverable A — KR-PROMOTE-ROUTER-TUNING (loop 3):
* ``kora_cli/promote/router_tuning/`` — observer reads
  cost_telemetry.snapshot()'s rolling_24h per-route counters;
  proposer surfaces routes whose Haiku→Opus escalation rate
  crosses a tunable threshold for operator-review. v1 ships
  ``tighten_review`` recommendations only — the ``loosen_review``
  signal needs a new audit row for operator /opus overrides that
  doesn't exist yet (documented in observer + proposer).
* Audit seam ``promotion.router_trigger_proposed``.
* Endpoints ``GET /api/promotions/router-tuning/pending`` +
  ``POST .../{id}/approve|reject``.
* Default auto-apply OFF — operator scaffolds trigger-pattern
  changes manually from the proposal rationale.

Deliverable B — KR-PROMOTE-TOOL-TRIMMING (loop 4):
* ``kora_cli/promote/tool_trimming/`` — observer tallies
  ``reasoning.tool_called`` audit rows by (route, tool_name) over
  a 30-day window; proposer identifies tools that any route never
  invoked + proposes adding them to that route's drop-list.
* v1 uses union-across-routes as the "available tools" set
  (cheaper than wiring a per-route registered-tool projection;
  documented as a clean future seam).
* Audit seam ``promotion.tool_trim_proposed``.
* Endpoints ``/api/promotions/tool-trimming/...``.
* Default auto-apply OFF — enforcement of the drop-list is
  deferred to the future KR-PLUGIN-TOOL-DESC-TRIM bucket which
  will wire the ``pre_tool_list_finalized`` hook to honor it.

Deliverable C — KR-PROMOTE-PROBE-FIX-ENVELOPES (loop 5):
* ``kora_cli/promote/probe_fix_envelopes/`` — observer reads
  ``probe.investigation_completed`` rows, skips those that already
  triggered an autofix attempt; clusters by (probe, issue_category);
  proposer emits new envelope-action proposals with conservative
  "operator must review" blast-radius default.
* Audit seam ``promotion.probe_envelope_action_proposed``.
* Endpoints ``/api/promotions/probe-envelopes/...``.
* Auto-apply HARDCODED FALSE (no env to flip) per spec §2
  deliverable C — high-risk; never auto-apply. STOP-ASK §4
  mitigation: approve endpoint transitions status + emits audit
  ONLY; operator manually scaffolds approved envelopes into
  ``probes/fix_envelopes.py``. Proposal payload carries the
  suggested FixEnvelope shape verbatim for mechanical copy-paste.

Shared store helper:
* ``kora_cli/promote/_shared/proposal_store.py`` — generic
  pending/approved/rejected/expired file-backed store factored
  out so the 3 new loops don't each duplicate the phrasebook
  store's per-status-dir layout. Snapshot_expand + phrasebook
  keep their bespoke stores unchanged.

Deliverable D — KR-PROBE-DEBOUNCE consecutive-failure buffering:
* Closes the queued #163 follow-on. New env
  ``KORA_PROBE_DEBOUNCE_CONSECUTIVE_REQUIRED`` (default 2).
* Buffer state (probe, category) → (count, first_seen_at)
  alongside the existing flat-window debounce map. Single wakes
  return buffered_skipped; second wake within the existing
  debounce window dispatches; window-expiry resets count.
* Critical severity + ``KORA_PROBE_DEBOUNCE_BYPASS_CRITICAL=true``
  still bypasses the buffer (same operator opt-in env as the
  flat-window bypass).
* Backward compat: ``required=1`` restores PR #166 behavior.

Deliverable E — KR-CHEAP-AUDIT-BATCHING (R3-4 #9):
* ``emit_audit`` now queues to a module-level batch; background
  daemon thread flushes every ``KORA_AUDIT_FLUSH_INTERVAL_SECONDS``
  (default 5s) and on size hit ``KORA_AUDIT_BATCH_SIZE`` (default
  100), whichever first.
* Per-emit interface UNCHANGED — callers still call ``emit_audit``
  synchronously; queue + flusher are internal.
* atexit handler drains pending events on process shutdown so
  short-lived CLI invocations don't lose audit rows.
* BATCH_SIZE=0 explicit opt-out preserves the legacy per-emit
  write path — used by the test conftest globally so existing
  emit-then-read tests stay sync without per-file fixtures.
* Tests cover: size-trigger, time-trigger (background thread),
  per-path grouping, shutdown-drain, opt-out path.

Tests:
* 32 new tests across the 3 new promote loops + shared store.
* 6 new tests for KR-PROBE-DEBOUNCE consecutive-failure paths.
* 6 new tests for KR-CHEAP-AUDIT-BATCHING.
* ``tests/kora_cli/conftest.py`` adds an autouse fixture that
  defaults to BATCH_SIZE=0 so the existing emit-then-read suites
  (wake_consumer, phrasebook, snapshot_expand, hermes_plugin
  audit) keep sync semantics. Dedicated batching tests opt back
  in via their own fixture.
* 582 pass across snapshot / promote / reasoning / probes /
  audit / hermes plugin suites; remaining failures are missing
  local dev-deps (prompt_toolkit, aiosmtplib, etc.) unrelated
  to this PR.

After this lands: all 5 promotion loops live; KR-CHEAP-AUDIT-
BATCHING completes R3-4 cheap-substrate items; KR-PROBE-DEBOUNCE
upgrade closes the queued #163 follow-on.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant