KR-PROMOTE-PHRASEBOOK-FOUNDATION-MEGABUCKET — first promotion loop end-to-end by rafe-walker · Pull Request #186 · rafe-walker/kora

rafe-walker · 2026-05-24T05:40:24Z

Summary

First concrete implementation of the promotion-loop pattern per feedback-promotion-loops-self-improving-subsystems. Six deliverables shipped as a single PR (batched per feedback-batch-bigger-buckets):

#	Deliverable	Module
A	Shared text-clustering utility	`kora_cli/clustering/text_similarity.py`
B	Reasoning DM observation collector	`kora_cli/promote/phrasebook/observer.py`
C	Promotion proposal generator	`kora_cli/promote/phrasebook/proposer.py`
D	3 new audit seams	`kora_cli/audit/jsonl_sink.py`
E	3 backend endpoints	`kora_cli/web_server.py`
F	Cron registration	`kora_cli/listeners/promote_phrasebook_listener.py`

After this lands: Kora observes recurring DM patterns at idle cadence (daily), drafts phrasebook proposals via clustering, operator reviews via cockpit (CC#2 follow-on KR-FE-PROMOTION-REVIEW-PANEL), audit captures the loop end-to-end.

STOP-ASKs resolved inline

§4 "Haiku-based pseudo-embedding clusters poorly" (Deliverable A) — pre-empted by going lexical from the start (token-set + char-3-gram, L2-normalized). Deterministic, $0 per text, well-suited to short operator-DM clustering. Haiku is still used by the proposer where natural-language synthesis genuinely helps (reply-template generation), one optional call per proposal.
§4 "Pattern derivation generates pathological regexes" (Deliverable C) — mitigated by escaping all regex metacharacters in the derived alternation. The phrasebook editor's existing validator catches anything that still slips through at approve-time, so the worst case is a 422 on operator approval, not a runtime regex bomb.
§4 "Cadence conflict with operator working hours" (Deliverable F) — interval-based default (24h) rather than wall-clock cron. The bucket spec's "0 6 * * *" suggestion was superseded because the heartbeat scheduler is interval-based; a "fire at 6am UTC" anchor would need a separate watch-and-act task (modeled on cost_telemetry_listener's daily-reset) which is appropriate for v2 if the operator wants a deterministic UTC anchor. Promotion outputs go to a review queue, not real-time alerts, so the wall-clock anchor doesn't matter operationally for v1.

Sample proposal JSON

Per the proposer projection (/api/promotions/phrasebook/pending returns a list of these):

{
  "proposal_id": "4f3a7c2b-9d12-4e8f-a1b2-c3d4e5f6a7b8",
  "cluster_size": 6,
  "sample_questions": [
    "Burn is $42 today; 75% of budget used.",
    "Burn is $42 today; 75% of budget used.",
    "Burn is $42 today; 75% of budget used."
  ],
  "proposed_pattern": "(?i)(budget|burn|used)",
  "proposed_reply_template": "Burn is $42 today; 75% of budget used.",
  "proposed_category": "cost_query",
  "confidence": 0.94,
  "created_at": "2026-05-24T06:00:00Z",
  "status": "pending",
  "review_notes": "",
  "cluster_caller_session_ids": [
    "D1JOSH:1742300000.1", "D1JOSH:1742310000.1", "D1JOSH:1742320000.1",
    "D1JOSH:1742330000.1", "D1JOSH:1742340000.1", "D1JOSH:1742350000.1"
  ],
  "haiku_synthesized": false
}

sample_questions in v1 shows the response excerpts (the bucket spec's reply-only clustering means we don't have the operator question text; future bucket can plumb the inbound JSONL via the observer's inbound_lookup hook).

Sample cycle log

Synthetic 24-hour run with 200 observations → 4 proposals → 1 expired:

[kora.promote.phrasebook.cycle] cycle complete: {
  'enabled': True,
  'observations_read': 200,
  'clusters_found': 4,
  'proposals_generated': 4,
  'proposals_persisted': 4,
  'expired_count': 1,
  'total_synth_cost_usd': 0.0,
  'started_at': '2026-05-24T06:00:00Z',
  'duration_ms': 421
}

With Haiku reply-template synthesis enabled (operator opt-in for v2):

[kora.promote.phrasebook.cycle] cycle complete: {
  ...
  'proposals_generated': 4,
  'total_synth_cost_usd': 0.0042,
  'duration_ms': 1830
}

Estimated $/day cost ceiling

Phase	Cost calc	Daily $
Embedder (lexical)	$0/text × 200 = $0	$0.00
Proposer (no Haiku synth)	$0	$0.00
Proposer (with Haiku synth, opt-in for v2)	~$0.001/proposal × ≤5 proposals	≤$0.005
Cache miss recovery (full re-warm)	lexical embedder still $0	$0
Daily ceiling (default v1)		$0.00
Daily ceiling (Haiku synth opt-in)		≤$0.005

Both well under the $0.01-0.05/day spec target. Plenty of headroom for the future promotion loops the spec called out (snapshot-expand, router-trigger, tool-trimming, probe-fix-envelope) to also share the daily budget.

The lexical-default decision is the load-bearing one — it means the daily cost is essentially zero, leaving the entire budget for future loops that genuinely need LLM judgment (which the phrasebook proposer doesn't, since reply-template uses the dominant verbatim response by default).

Env vars added

Env	Default	Purpose
`KORA_PROMOTE_PHRASEBOOK_ENABLED`	`true`	Master kill-switch. False = cycle returns 0 proposals without reading observations.
`KORA_PROMOTE_PHRASEBOOK_INTERVAL_SEC`	`86400`	Heartbeat-scheduler interval (seconds). 86400 = once daily.
`KORA_PROMOTE_PHRASEBOOK_OBSERVATION_WINDOW_DAYS`	`7`	How far back the observer reads for clustering.
`KORA_PROMOTE_PHRASEBOOK_MIN_CLUSTER_SIZE`	`5`	Drop clusters smaller than this.
`KORA_PROMOTE_PHRASEBOOK_EXPIRY_DAYS`	`14`	Auto-expire pending proposals older than this.
`KORA_PROMOTIONS_DIR`	unset	Test override for the proposal store root (defaults to `${KORA_HOME}/promotions/`).

CC#2 follow-on recommendation: `KR-FE-PROMOTION-REVIEW-PANEL` (refined)

The endpoints + persisted proposal shape are stable. CC#2 builds:

Sidebar nav count — GET /api/promotions/phrasebook/pending length is the badge.
Review list — confidence-sorted (already done BE-side). Each row renders:
- cluster_size + confidence badge
- 3 sample_questions excerpts (operator recognizes the cluster)
- proposed_pattern (operator-editable inline)
- proposed_reply_template (operator-editable inline; haiku_synthesized=true rows get a "Kora wrote this" badge for extra-careful review)
- proposed_category (operator-editable; dropdown of existing categories + free text)
One-click approve / reject buttons → POST endpoints with optional override payload.
Drift-guard pin: import _PROMOTION_STATUS_VALUES from BE endpoint response (status_values field of the list endpoint) into a FE constant; snapshot-test fails CI if the two drift.
AuditPanelKit integration: render promotion.proposed / promotion.approved / promotion.rejected audit JSONL rows in a sidebar log so operator can see the history (the in-flight feat: OBLITERATUS skill — LLM refusal removal via SVD-based weight projection NousResearch/hermes-agent#408 AuditPanelKit makes this nearly free once landed).
Edit-before-approve flow: pre-fill the phrasebook editor (PR feat(kora): KR-FE-PHRASEBOOK-EDITOR-AND-CRUD — write path with validation + revert #177) with the proposed entry; operator can tweak then approve — same approve endpoint with override payload.

The wire shape is forward-compatible: future fields (e.g., synth_cost_usd, cluster_caller_session_ids) are additive and the cockpit can ignore unknown keys.

Test plan

52 new tests across 4 files: clustering (11), observer (9), proposer (10), store + cycle + endpoints (22)
Regression: 432 passed across clustering + promote + audit + handlers + short_circuit + reasoning + probes
ruff check clean on all changed files

🤖 Generated with Claude Code

…tion loop end-to-end First concrete implementation of the promotion-loop pattern per ``feedback-promotion-loops-self-improving-subsystems``. After this lands: Kora observes recurring DM patterns, drafts phrasebook proposals at idle cadence, operator approves via cockpit (CC#2 follow-on KR-FE-PROMOTION-REVIEW-PANEL), audit captures the loop end-to-end. Six deliverables in one batched bucket — single PR. Six deliverables ---------------- A. Shared Haiku-clustering utility — ``kora_cli/clustering/``: * ``text_similarity.embed_texts`` — async embedder with disk cache. Lexical (token + char 3-gram, L2-normalized, deterministic, $0) rather than Haiku-based. Decision was anticipated by the bucket's STOP-ASK §4 alternative; the proposer still uses Haiku where natural-language synthesis genuinely helps (reply-template generation, one call per proposal). * ``cosine_similarity`` + ``cluster_by_similarity`` (greedy agglomerative). B. Reasoning DM observation collector — ``kora_cli/promote/phrasebook/observer.py``: * Reads ``slack_dm_log.jsonl`` (handler-driven replies + wake-consumer replies both write here post-#184) → ``ReasoningObservation`` records. * Filters: route allow-list, drop short-circuit hits, drop missing-engine canned fallbacks, drop failed sends, time window via ``since``. * Per-call cost reused from ``agent.usage_pricing.estimate_usage_cost`` (same canonical pricing as ``cost_state_holder.record_inference``). C. Promotion proposal generator — ``kora_cli/promote/phrasebook/proposer.py``: * Clusters observations, applies size + cohesion + answer-consistency thresholds. * Derives a conservative regex pattern (escaped alternation of top tokens — protects against the STOP-ASK §4 regex- pathology concern; phrasebook editor's validator catches anything that slips through at approve-time). * Reply template: dominant verbatim response OR optional Haiku synthesis (injectable for tests). * Confidence = (cohesion + consistency) / 2. D. Three new audit seams in ``SeamName`` Literal: * ``promotion.proposed`` — per proposal at proposer time. ``synth_cost_usd`` field per row for cost telemetry. * ``promotion.approved`` — at endpoint time. Also emits ``phrasebook.updated`` with ``actor="kora_proposal_approved"`` per PR #177 forward-compat. * ``promotion.rejected`` — with operator-supplied ``review_notes`` recorded verbatim (#182 precedent). E. Three new backend endpoints in ``web_server.py``: * ``GET /api/promotions/phrasebook/pending`` * ``POST /api/promotions/phrasebook/{id}/approve`` — optional pattern_override / reply_template_override / category_override / review_notes payload. * ``POST /api/promotions/phrasebook/{id}/reject`` — ``{review_notes}`` payload. * Drift-guard pin ``_PROMOTION_STATUS_VALUES``; CC#2's KR-FE-PROMOTION-REVIEW-PANEL adds the symmetric FE constant. F. Cron registration — ``kora_cli/listeners/promote_phrasebook_listener.py``: * Registers ``run_phrasebook_promotion_cycle`` via the heartbeat scheduler (interval-based, 86400s default). * Cron-string suggestion ("0 6 * * *") superseded by the interval shape since the heartbeat scheduler is interval-based; documented in the listener docstring. * Env tunables: ``KORA_PROMOTE_PHRASEBOOK_ENABLED`` (default true), ``KORA_PROMOTE_PHRASEBOOK_INTERVAL_SEC`` (86400), ``KORA_PROMOTE_PHRASEBOOK_OBSERVATION_WINDOW_DAYS`` (7), ``KORA_PROMOTE_PHRASEBOOK_MIN_CLUSTER_SIZE`` (5), ``KORA_PROMOTE_PHRASEBOOK_EXPIRY_DAYS`` (14). Persistence ----------- Proposals live at ``${KORA_HOME}/promotions/phrasebook/{pending,approved,rejected,expired}/<uuid>.json``. Status transitions move the file atomically via os.replace. Files + audit JSONL together are the forensic-truth stream. Tests ----- 52 new tests across 4 files. ``ruff check`` clean. 432-test regression set (clustering + promote + audit + handlers + short_circuit + reasoning + probes) all pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

rafe-walker merged commit 918385b into feature/phase2-upgrades May 24, 2026

rafe-walker deleted the feat/kora-KR-PROMOTE-PHRASEBOOK-FOUNDATION-MEGABUCKET branch May 24, 2026 05:59

This was referenced May 24, 2026

feat(kora): KR-FE-PROMOTION-REVIEW-AND-V2-MEGABUCKET — review panel + probe V2 + kora-actions extended #191

Merged

feat(kora): KR-PROMOTE-LOOPS-COMPLETION-MEGABUCKET — 3 loops + debounce + audit batching #193

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KR-PROMOTE-PHRASEBOOK-FOUNDATION-MEGABUCKET — first promotion loop end-to-end#186

KR-PROMOTE-PHRASEBOOK-FOUNDATION-MEGABUCKET — first promotion loop end-to-end#186
rafe-walker merged 1 commit into
feature/phase2-upgradesfrom
feat/kora-KR-PROMOTE-PHRASEBOOK-FOUNDATION-MEGABUCKET

rafe-walker commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rafe-walker commented May 24, 2026

Summary

STOP-ASKs resolved inline

Sample proposal JSON

Sample cycle log

Estimated $/day cost ceiling

Env vars added

CC#2 follow-on recommendation: KR-FE-PROMOTION-REVIEW-PANEL (refined)

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

CC#2 follow-on recommendation: `KR-FE-PROMOTION-REVIEW-PANEL` (refined)