feat(kora): KR-HAIKU-ROUTER — default-Haiku with earned Opus escalation (Lock R3-3) by rafe-walker · Pull Request #165 · rafe-walker/kora

rafe-walker · 2026-05-24T02:22:35Z

Summary

Per Council R3 Lock R3-3 — the cost-ladder FLIP. The reasoning engine no longer defaults to Opus with reactive downshift; it defaults to Haiku 4.5 and escalates to Opus 4.7 ONLY when an earning signal fires. Cost-ladder rung stays as Layer 2 defensive backstop.

Earning signals — precedence order

#	Signal	Path	Reason code
1	`hard_stop_100` rung	model=None, caller fail-fasts	`cost_ladder_halted`
2	`warn_75` / `downshift_90` rung	Force Haiku (overrides everything below)	`cost_clamp:`
3	`KORA_FORCE_OPUS=true` env	Opus	`force_opus_env`
4	iteration ≥ 2	Opus (tool-loop earning)	`tool_loop_iteration`
5	`/opus` prefix in message	Opus, prefix stripped	`opus_prefix`
6	Decision-language regex	Opus	`decision_language`
7	Default	Haiku	`default_haiku`

Post-call (only on iteration 1 with stop_reason=end_turn):

Low-confidence marker in Haiku response → re-issue to Opus with Haiku's response woven into the conversation (assistant turn) + a follow-up user turn asking Opus to review/improve. Same system + tools blocks → cache stays warm.
Heuristic length: response < 50 chars when input > 200 chars → escalate.

Escalation rate — signal-by-signal expectations

Signal	Estimated fire rate (per inbound DM)	Drives Opus?
`/opus` prefix	<1% — explicit operator override only	Yes
`KORA_FORCE_OPUS` env	0% in normal ops; 100% when operator flips for safety	Yes
iteration ≥ 2	~10-15% — depends on tool-use frequency, which Joshua's "status + ledger" composite queries hit	Yes
Decision-language regex	~5-10% — fires on "should I", "decide", "approve", "ship or not"-shaped questions	Yes
Post-call low-confidence	~3-5% — Haiku tends to phrase uncertainty ("I'm not sure", "I don't know") and a small fraction of substantive questions get terse Haiku responses	Yes
Cost-clamp (WARN_75/DOWNSHIFT_90)	0% in normal budget conditions; can spike to ~50% over a month-end	No (forces Haiku)
Total Opus rate	~15-25% of inbound DMs going to engine (after PR #160 short-circuit catches its 30-40%)	—

Net: roughly 50-60% of Joshua's inbound DM volume hits Haiku at $0.10/M-read vs Opus $5.00/M, with the remainder split between short-circuit ($0) and earned Opus.

cost_downshift harmonization

agent/cost_downshift.py is UNTOUCHED. It governs substrate Sea_Ticket queue decisions (defer-vs-run; criticality clamping) — a separate domain from the engine-side Joshua-DM reply path. The router reads cost_rung as one of its inputs (the Layer 2 backstop) but does NOT call into cost_downshift. Both can co-exist; both use the word "downshift" in different domain contexts (substrate-tier vs LLM-tier).

Engine-side: RUNG_MODEL_MAP at anthropic_engine.py:115 is DEPRECATED — the router now owns model selection. The const is preserved (mapped uniformly to MODEL_HAIKU) for any downstream consumer that imports it, but the engine no longer consults it.

K-DG verification

Current Opus model string: claude-opus-4-7 (verified via grep of MODEL_OPUS constant + RUNG_MODEL_MAP table at anthropic_engine.py:108-118). Used verbatim in DEFAULT_OPUS_MODEL for cache-key stability.
Current Haiku model string: claude-haiku-4-5-20251001 (verified). Long form preserved in DEFAULT_HAIKU_MODEL for cache-key stability with already-warm caches.
Telemetry record_call signature: takes route, model, canonical_usage, cost_estimate_usd, escalated_to_opus (verified at cost_telemetry.py:215).
agent/cost_downshift.py: returns DownshiftDecision for Sea_Ticket runners — different domain; no overlap.

Env-tunable triggers

Env var	Default	Purpose
`KORA_FORCE_OPUS`	unset	`true` forces every call to Opus
`KORA_OPUS_TRIGGER_PATTERNS`	7 default patterns	Comma-separated regex list
`KORA_OPUS_PREFIX`	`/opus`	Override the operator prefix string
`KORA_HAIKU_LOW_CONFIDENCE_PATTERNS`	6 default patterns	Comma-separated regex list

Test plan

50 router unit tests pass (every decision branch + env override + precedence boundary)
13 engine integration tests pass (default → Haiku; decision-language → Opus; prefix → Opus + stripped; warn_75 → clamp; hard_stop → fail; iteration 2 → Opus; post-call escalation re-issue with Haiku context; cache-warmth preserved; telemetry per call)
591/591 reasoning + handlers + router + listeners regression green serially
Full repo xdist: 9447 passed, 43 failed identical to baseline (test_anthropic_adapter / test_backup / test_config / test_gateway_* / test_web_server* / test_kanban_db / test_list_picker_providers / test_model_switch_* / test_startup_plugin_gating). Zero failures in router, reasoning, or handlers.
Pre-existing rung-selection tests updated to reflect Haiku-default behavior (3 renames)
Pre-existing test-spy fixtures updated to accept the route + escalated_to_opus kwargs holder.record_inference gained in feat(kora): KR-CHEAP-COST-TELEMETRY — per-route counters (R3-4 #10) #161 (feat(kora): KR-CHEAP-COST-TELEMETRY — per-route counters (R3-4 #10) #161 didn't update the fixtures — fixed here)
/opus prefix stripping verified by walking the SDK call's messages array

🤖 Generated with Claude Code

…on (Lock R3-3) Per Council R3 Lock R3-3 (the cost-ladder FLIP) + build-list R3-4 #4. The reasoning engine no longer defaults to Opus with reactive downshift; it defaults to **Haiku 4.5** and escalates to Opus 4.7 ONLY when an earning signal fires. The existing cost-ladder rung stays as a Layer 2 defensive backstop: WARN_75 / DOWNSHIFT_90 force every call to Haiku regardless of any earning signal; HARD_STOP_100 still fails the call. # New package: kora_cli/router/ kora_cli/router/__init__.py — re-exports kora_cli/router/cost_router.py — RoutingDecision + select_model_pre_call + should_escalate_post_call + strip_opus_prefix + env-tunable trigger patterns. # Earning signals (precedence order) Pre-call (decided BEFORE the first API call): 1. HARD_STOP_100 rung → model=None, caller fail-fasts 2. WARN_75 / DOWNSHIFT_90 → force Haiku (cost backstop wins over every earning signal below) 3. KORA_FORCE_OPUS env → Opus 4. iteration ≥ 2 → Opus (tool-loop iteration earning) 5. /opus prefix → Opus, prefix stripped from prompt 6. Decision-language regex → Opus (env-tunable patterns: "should I", "do we", "decide", "approve", "go/no-go", "is it safe", "plan strategy for", etc.) 7. Default → Haiku Post-call (decided AFTER iteration-1 Haiku, only when stop_reason == "end_turn"): - Low-confidence marker in Haiku response ("I'm not sure", "I don't have enough", etc. — env-tunable) → escalate - Heuristic: response shorter than 50 chars when input longer than 200 chars → escalate Escalation re-issues iteration 1 to Opus with Haiku's response woven into the conversation history (assistant turn) + a follow-up user turn asking Opus to review/improve. Reuses the same system + tools blocks so the prompt cache stays warm. BOTH the Haiku attempt + Opus re-issue are billed (the cost of escalation); telemetry records two separate calls. # Cost-downshift harmonization agent/cost_downshift.py is left UNTOUCHED. It governs **Sea_ Ticket queue** decisions (defer vs run; criticality clamping) which is a separate substrate-side concern. The router runs on a different code path (engine-side / Joshua-DM reply path) and reads cost_rung as one of its inputs (the Layer 2 backstop). Both name "downshift" in different domain contexts; both co-exist cleanly. Engine-side: RUNG_MODEL_MAP at anthropic_engine.py:115 is DEPRECATED — the router now owns model selection. The const is preserved (mapped uniformly to MODEL_HAIKU) for any downstream consumer that imports it; the engine no longer consults it. # Telemetry wiring (first production record_call call site) Engine emits ``get_telemetry().record_call(...)`` per API call: - iteration 1 → route = source-mapped (slack_dm / email_inbound / mcp_tool / unknown) - iteration ≥ 2 → route = ROUTE_TOOL_LOOP_ITERATION - escalated_to_opus = True for every earning-signal path AND every post-call escalation re-issue Cost-estimate computed per-call via ``agent.usage_pricing.estimate_usage_cost`` with the canonical PricingEntry rates. Best-effort: any telemetry failure logs at DEBUG and the hot reasoning path continues. # Env-tunable triggers KORA_FORCE_OPUS — operator defensive switch KORA_OPUS_TRIGGER_PATTERNS — comma-separated regex list (overrides DEFAULT_DECISION_ PATTERNS — 7 entries) KORA_OPUS_PREFIX — default "/opus" KORA_HAIKU_LOW_CONFIDENCE_PATTERNS — comma-separated regex (overrides 6-entry default) # Tests tests/kora_cli/router/test_cost_router.py — 50 router unit tests covering every decision-tree branch + every env override + every precedence boundary (force_opus vs cost_clamp, iteration_2 vs opus_prefix, etc.). tests/kora_cli/reasoning/test_anthropic_engine_router.py — 13 engine integration tests: - default Haiku on simple message - decision-language → Opus - /opus prefix → Opus + prefix stripped from SDK messages - warn_75 → Haiku (cost clamp) - hard_stop → cost_ladder_halted, no SDK call - iteration 2 → Opus - low-confidence Haiku → Opus re-issue with Haiku response in messages array - high-confidence Haiku → only 1 SDK call - tool_use stop_reason doesn't trigger post-call escalation - Opus re-issue reuses identical system + tools blocks (cache warmth preserved) - telemetry record_call invoked per API call with correct route + escalated_to_opus flag Pre-existing tests updated: - test_normal_rung_selects_opus → renamed test_normal_rung_defaults_to_haiku (post-flip behavior) - test_warn_75_rung_selects_sonnet → renamed test_warn_75_rung_clamps_to_haiku - test_unknown_rung_defaults_to_opus_and_continues → renamed test_unknown_rung_defaults_to_haiku (no more WARN log; router treats unknown as no-clamp + falls to default) Pre-existing test-spy fixtures updated to accept the ``route`` + ``escalated_to_opus`` kwargs that ``holder.record_inference`` gained in #161 (gap from #161 that landed without test-fixture updates; addressed here to keep the regression green): - tests/kora_cli/handlers/test_slack_dm_reply.py::_spy - tests/kora_cli/reasoning/test_anthropic_engine_caching.py ::_capture # Regression 591/591 reasoning + handlers + router + listeners green serially. Full repo xdist: 9447 passed, 43 failed identical to baseline (test_anthropic_adapter / test_backup / test_config / test_gateway_* / test_web_server* / test_kanban_db / test_list_picker_providers / test_model_switch_* / test_startup_plugin_gating / test_web_server_cron_profiles). Zero failures in router, reasoning, or handlers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…it_pool_usd (schema v3) (#169) Snapshot schema_version 2→3. Adds spent_to_date_usd + credit_pool_usd to cost_ladder section. KORA_CREDIT_POOL_USD env override (default $200 per reference-anthropic-sdk-billing-split); malformed/non-positive values warn + fall back. Bonus: model_default now resolves via KR-HAIKU-ROUTER (#165) DEFAULT_HAIKU_MODEL — router-independent of cost holder, so PR #157's 'unknown' placeholder for that field is fully retired. Unblocks CC#2 follow-on (CostCardBody shift to snapshot — sub-ms read, decoupled from cost-holder boot windows, lag bounded to 5-min cron cadence). /api/cost-telemetry retained as live source for pre-decision reads. 47 snapshot tests + 159-test cross-bucket regression + ruff clean.

…ck R3-2 Phase C completion (#189) Substantial paired bundle. Deliverable 1 — New local Hermes hook post_llm_call_can_reissue at agent/conversation_loop.py. First-non-None override semantics matching #172/#181 family. Anti-loop safety: at most one reissue per iteration. Backward compat: post_api_request observer sees FINAL (post-reissue) response only. Deliverable 2 — kora_hermes_plugin/haiku_router/ following #185 template. Plugin consumes should_escalate_post_call (from cost_ladder/selector.py since #185 but no caller until now). Implements parallel-Claude pattern from R3: low-confidence Haiku response → reissue to Opus with Haiku response in messages context. record_inference fires twice per iteration (Haiku + Opus reissue) with escalated_to_opus tagged correctly. Sample trace: Haiku reply uncertain (i"m not sure...) → escalates → Opus response includes Haiku context → Opus confirms terser. Two consecutive record_inference calls, only second tagged escalated_to_opus=True. What this completes: 6 of 7 plugin extractions (KR-PLUGIN-IDENTITY still deferred per Lock R3-2); all KR-HAIKU-ROUTER #165 escalation paths now functional end-to-end; Lock R3-2 Phase C closed. Two non-blocking follow-ups flagged in PR body: escalation_reason as structured telemetry field (one-line CostStateHolder.record_inference bump); per-call api_call_count accounting (transparent-upgrade vs per-call — currently transparent). 125 directly-affected tests green; 56 broader-suite failures verified pre-existing on base (fastapi/blake3/HERMES_HOME environmental, not regressions).

rafe-walker merged commit 0d9626b into feature/phase2-upgrades May 24, 2026

rafe-walker deleted the feat/kora-KR-HAIKU-ROUTER branch May 24, 2026 02:24

This was referenced May 24, 2026

research(kora): KR-FORK-HOOK-VERIFY — Hermes Gateway hook coverage report #168

Merged

KR-SNAPSHOT-EXPAND-COST-FIELDS — schema v3 cost-ladder spend/pool #169

Merged

rafe-walker mentioned this pull request May 24, 2026

feat(kora): KR-HERMES-LOCAL-EXT-REISSUE + KR-HAIKU-ROUTER-PLUGIN — completes Lock R3-2 Phase C #189

Merged

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(kora): KR-HAIKU-ROUTER — default-Haiku with earned Opus escalation (Lock R3-3)#165

feat(kora): KR-HAIKU-ROUTER — default-Haiku with earned Opus escalation (Lock R3-3)#165
rafe-walker merged 1 commit into
feature/phase2-upgradesfrom
feat/kora-KR-HAIKU-ROUTER

rafe-walker commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rafe-walker commented May 24, 2026

Summary

Earning signals — precedence order

Escalation rate — signal-by-signal expectations

cost_downshift harmonization

K-DG verification

Env-tunable triggers

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant