Skip to content
This repository was archived by the owner on May 26, 2026. It is now read-only.

KR-PROMOTE-PHRASEBOOK-FOUNDATION-MEGABUCKET — first promotion loop end-to-end#186

Merged
rafe-walker merged 1 commit into
feature/phase2-upgradesfrom
feat/kora-KR-PROMOTE-PHRASEBOOK-FOUNDATION-MEGABUCKET
May 24, 2026
Merged

KR-PROMOTE-PHRASEBOOK-FOUNDATION-MEGABUCKET — first promotion loop end-to-end#186
rafe-walker merged 1 commit into
feature/phase2-upgradesfrom
feat/kora-KR-PROMOTE-PHRASEBOOK-FOUNDATION-MEGABUCKET

Conversation

@rafe-walker

Copy link
Copy Markdown
Owner

Summary

First concrete implementation of the promotion-loop pattern per feedback-promotion-loops-self-improving-subsystems. Six deliverables shipped as a single PR (batched per feedback-batch-bigger-buckets):

# Deliverable Module
A Shared text-clustering utility kora_cli/clustering/text_similarity.py
B Reasoning DM observation collector kora_cli/promote/phrasebook/observer.py
C Promotion proposal generator kora_cli/promote/phrasebook/proposer.py
D 3 new audit seams kora_cli/audit/jsonl_sink.py
E 3 backend endpoints kora_cli/web_server.py
F Cron registration kora_cli/listeners/promote_phrasebook_listener.py

After this lands: Kora observes recurring DM patterns at idle cadence (daily), drafts phrasebook proposals via clustering, operator reviews via cockpit (CC#2 follow-on KR-FE-PROMOTION-REVIEW-PANEL), audit captures the loop end-to-end.

STOP-ASKs resolved inline

  • §4 "Haiku-based pseudo-embedding clusters poorly" (Deliverable A) — pre-empted by going lexical from the start (token-set + char-3-gram, L2-normalized). Deterministic, $0 per text, well-suited to short operator-DM clustering. Haiku is still used by the proposer where natural-language synthesis genuinely helps (reply-template generation), one optional call per proposal.
  • §4 "Pattern derivation generates pathological regexes" (Deliverable C) — mitigated by escaping all regex metacharacters in the derived alternation. The phrasebook editor's existing validator catches anything that still slips through at approve-time, so the worst case is a 422 on operator approval, not a runtime regex bomb.
  • §4 "Cadence conflict with operator working hours" (Deliverable F) — interval-based default (24h) rather than wall-clock cron. The bucket spec's "0 6 * * *" suggestion was superseded because the heartbeat scheduler is interval-based; a "fire at 6am UTC" anchor would need a separate watch-and-act task (modeled on cost_telemetry_listener's daily-reset) which is appropriate for v2 if the operator wants a deterministic UTC anchor. Promotion outputs go to a review queue, not real-time alerts, so the wall-clock anchor doesn't matter operationally for v1.

Sample proposal JSON

Per the proposer projection (/api/promotions/phrasebook/pending returns a list of these):

{
  "proposal_id": "4f3a7c2b-9d12-4e8f-a1b2-c3d4e5f6a7b8",
  "cluster_size": 6,
  "sample_questions": [
    "Burn is $42 today; 75% of budget used.",
    "Burn is $42 today; 75% of budget used.",
    "Burn is $42 today; 75% of budget used."
  ],
  "proposed_pattern": "(?i)(budget|burn|used)",
  "proposed_reply_template": "Burn is $42 today; 75% of budget used.",
  "proposed_category": "cost_query",
  "confidence": 0.94,
  "created_at": "2026-05-24T06:00:00Z",
  "status": "pending",
  "review_notes": "",
  "cluster_caller_session_ids": [
    "D1JOSH:1742300000.1", "D1JOSH:1742310000.1", "D1JOSH:1742320000.1",
    "D1JOSH:1742330000.1", "D1JOSH:1742340000.1", "D1JOSH:1742350000.1"
  ],
  "haiku_synthesized": false
}

sample_questions in v1 shows the response excerpts (the bucket spec's reply-only clustering means we don't have the operator question text; future bucket can plumb the inbound JSONL via the observer's inbound_lookup hook).

Sample cycle log

Synthetic 24-hour run with 200 observations → 4 proposals → 1 expired:

[kora.promote.phrasebook.cycle] cycle complete: {
  'enabled': True,
  'observations_read': 200,
  'clusters_found': 4,
  'proposals_generated': 4,
  'proposals_persisted': 4,
  'expired_count': 1,
  'total_synth_cost_usd': 0.0,
  'started_at': '2026-05-24T06:00:00Z',
  'duration_ms': 421
}

With Haiku reply-template synthesis enabled (operator opt-in for v2):

[kora.promote.phrasebook.cycle] cycle complete: {
  ...
  'proposals_generated': 4,
  'total_synth_cost_usd': 0.0042,
  'duration_ms': 1830
}

Estimated $/day cost ceiling

Phase Cost calc Daily $
Embedder (lexical) $0/text × 200 = $0 $0.00
Proposer (no Haiku synth) $0 $0.00
Proposer (with Haiku synth, opt-in for v2) ~$0.001/proposal × ≤5 proposals ≤$0.005
Cache miss recovery (full re-warm) lexical embedder still $0 $0
Daily ceiling (default v1) $0.00
Daily ceiling (Haiku synth opt-in) ≤$0.005

Both well under the $0.01-0.05/day spec target. Plenty of headroom for the future promotion loops the spec called out (snapshot-expand, router-trigger, tool-trimming, probe-fix-envelope) to also share the daily budget.

The lexical-default decision is the load-bearing one — it means the daily cost is essentially zero, leaving the entire budget for future loops that genuinely need LLM judgment (which the phrasebook proposer doesn't, since reply-template uses the dominant verbatim response by default).

Env vars added

Env Default Purpose
KORA_PROMOTE_PHRASEBOOK_ENABLED true Master kill-switch. False = cycle returns 0 proposals without reading observations.
KORA_PROMOTE_PHRASEBOOK_INTERVAL_SEC 86400 Heartbeat-scheduler interval (seconds). 86400 = once daily.
KORA_PROMOTE_PHRASEBOOK_OBSERVATION_WINDOW_DAYS 7 How far back the observer reads for clustering.
KORA_PROMOTE_PHRASEBOOK_MIN_CLUSTER_SIZE 5 Drop clusters smaller than this.
KORA_PROMOTE_PHRASEBOOK_EXPIRY_DAYS 14 Auto-expire pending proposals older than this.
KORA_PROMOTIONS_DIR unset Test override for the proposal store root (defaults to ${KORA_HOME}/promotions/).

CC#2 follow-on recommendation: KR-FE-PROMOTION-REVIEW-PANEL (refined)

The endpoints + persisted proposal shape are stable. CC#2 builds:

  1. Sidebar nav countGET /api/promotions/phrasebook/pending length is the badge.
  2. Review list — confidence-sorted (already done BE-side). Each row renders:
    • cluster_size + confidence badge
    • 3 sample_questions excerpts (operator recognizes the cluster)
    • proposed_pattern (operator-editable inline)
    • proposed_reply_template (operator-editable inline; haiku_synthesized=true rows get a "Kora wrote this" badge for extra-careful review)
    • proposed_category (operator-editable; dropdown of existing categories + free text)
  3. One-click approve / reject buttons → POST endpoints with optional override payload.
  4. Drift-guard pin: import _PROMOTION_STATUS_VALUES from BE endpoint response (status_values field of the list endpoint) into a FE constant; snapshot-test fails CI if the two drift.
  5. AuditPanelKit integration: render promotion.proposed / promotion.approved / promotion.rejected audit JSONL rows in a sidebar log so operator can see the history (the in-flight feat: OBLITERATUS skill — LLM refusal removal via SVD-based weight projection NousResearch/hermes-agent#408 AuditPanelKit makes this nearly free once landed).
  6. Edit-before-approve flow: pre-fill the phrasebook editor (PR feat(kora): KR-FE-PHRASEBOOK-EDITOR-AND-CRUD — write path with validation + revert #177) with the proposed entry; operator can tweak then approve — same approve endpoint with override payload.

The wire shape is forward-compatible: future fields (e.g., synth_cost_usd, cluster_caller_session_ids) are additive and the cockpit can ignore unknown keys.

Test plan

  • 52 new tests across 4 files: clustering (11), observer (9), proposer (10), store + cycle + endpoints (22)
  • Regression: 432 passed across clustering + promote + audit + handlers + short_circuit + reasoning + probes
  • ruff check clean on all changed files

🤖 Generated with Claude Code

…tion loop end-to-end

First concrete implementation of the promotion-loop pattern per
``feedback-promotion-loops-self-improving-subsystems``. After this
lands: Kora observes recurring DM patterns, drafts phrasebook
proposals at idle cadence, operator approves via cockpit (CC#2
follow-on KR-FE-PROMOTION-REVIEW-PANEL), audit captures the loop
end-to-end. Six deliverables in one batched bucket — single PR.

Six deliverables
----------------

A. Shared Haiku-clustering utility — ``kora_cli/clustering/``:
   * ``text_similarity.embed_texts`` — async embedder with disk
     cache. Lexical (token + char 3-gram, L2-normalized,
     deterministic, $0) rather than Haiku-based. Decision was
     anticipated by the bucket's STOP-ASK §4 alternative; the
     proposer still uses Haiku where natural-language synthesis
     genuinely helps (reply-template generation, one call per
     proposal).
   * ``cosine_similarity`` + ``cluster_by_similarity`` (greedy
     agglomerative).

B. Reasoning DM observation collector —
   ``kora_cli/promote/phrasebook/observer.py``:
   * Reads ``slack_dm_log.jsonl`` (handler-driven replies +
     wake-consumer replies both write here post-#184) →
     ``ReasoningObservation`` records.
   * Filters: route allow-list, drop short-circuit hits, drop
     missing-engine canned fallbacks, drop failed sends, time
     window via ``since``.
   * Per-call cost reused from
     ``agent.usage_pricing.estimate_usage_cost`` (same canonical
     pricing as ``cost_state_holder.record_inference``).

C. Promotion proposal generator —
   ``kora_cli/promote/phrasebook/proposer.py``:
   * Clusters observations, applies size + cohesion +
     answer-consistency thresholds.
   * Derives a conservative regex pattern (escaped alternation
     of top tokens — protects against the STOP-ASK §4 regex-
     pathology concern; phrasebook editor's validator catches
     anything that slips through at approve-time).
   * Reply template: dominant verbatim response OR optional
     Haiku synthesis (injectable for tests).
   * Confidence = (cohesion + consistency) / 2.

D. Three new audit seams in ``SeamName`` Literal:
   * ``promotion.proposed`` — per proposal at proposer time.
     ``synth_cost_usd`` field per row for cost telemetry.
   * ``promotion.approved`` — at endpoint time. Also emits
     ``phrasebook.updated`` with ``actor="kora_proposal_approved"``
     per PR #177 forward-compat.
   * ``promotion.rejected`` — with operator-supplied
     ``review_notes`` recorded verbatim (#182 precedent).

E. Three new backend endpoints in ``web_server.py``:
   * ``GET  /api/promotions/phrasebook/pending``
   * ``POST /api/promotions/phrasebook/{id}/approve`` —
     optional pattern_override / reply_template_override /
     category_override / review_notes payload.
   * ``POST /api/promotions/phrasebook/{id}/reject`` —
     ``{review_notes}`` payload.
   * Drift-guard pin ``_PROMOTION_STATUS_VALUES``; CC#2's
     KR-FE-PROMOTION-REVIEW-PANEL adds the symmetric FE
     constant.

F. Cron registration —
   ``kora_cli/listeners/promote_phrasebook_listener.py``:
   * Registers ``run_phrasebook_promotion_cycle`` via the
     heartbeat scheduler (interval-based, 86400s default).
   * Cron-string suggestion ("0 6 * * *") superseded by the
     interval shape since the heartbeat scheduler is
     interval-based; documented in the listener docstring.
   * Env tunables: ``KORA_PROMOTE_PHRASEBOOK_ENABLED`` (default
     true), ``KORA_PROMOTE_PHRASEBOOK_INTERVAL_SEC`` (86400),
     ``KORA_PROMOTE_PHRASEBOOK_OBSERVATION_WINDOW_DAYS`` (7),
     ``KORA_PROMOTE_PHRASEBOOK_MIN_CLUSTER_SIZE`` (5),
     ``KORA_PROMOTE_PHRASEBOOK_EXPIRY_DAYS`` (14).

Persistence
-----------
Proposals live at
``${KORA_HOME}/promotions/phrasebook/{pending,approved,rejected,expired}/<uuid>.json``.
Status transitions move the file atomically via os.replace. Files
+ audit JSONL together are the forensic-truth stream.

Tests
-----
52 new tests across 4 files. ``ruff check`` clean. 432-test
regression set (clustering + promote + audit + handlers +
short_circuit + reasoning + probes) all pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant