This repository was archived by the owner on May 26, 2026. It is now read-only.
feat(kora): KR-HAIKU-ROUTER — default-Haiku with earned Opus escalation (Lock R3-3)#165
Merged
Merged
Conversation
…on (Lock R3-3) Per Council R3 Lock R3-3 (the cost-ladder FLIP) + build-list R3-4 #4. The reasoning engine no longer defaults to Opus with reactive downshift; it defaults to **Haiku 4.5** and escalates to Opus 4.7 ONLY when an earning signal fires. The existing cost-ladder rung stays as a Layer 2 defensive backstop: WARN_75 / DOWNSHIFT_90 force every call to Haiku regardless of any earning signal; HARD_STOP_100 still fails the call. # New package: kora_cli/router/ kora_cli/router/__init__.py — re-exports kora_cli/router/cost_router.py — RoutingDecision + select_model_pre_call + should_escalate_post_call + strip_opus_prefix + env-tunable trigger patterns. # Earning signals (precedence order) Pre-call (decided BEFORE the first API call): 1. HARD_STOP_100 rung → model=None, caller fail-fasts 2. WARN_75 / DOWNSHIFT_90 → force Haiku (cost backstop wins over every earning signal below) 3. KORA_FORCE_OPUS env → Opus 4. iteration ≥ 2 → Opus (tool-loop iteration earning) 5. /opus prefix → Opus, prefix stripped from prompt 6. Decision-language regex → Opus (env-tunable patterns: "should I", "do we", "decide", "approve", "go/no-go", "is it safe", "plan strategy for", etc.) 7. Default → Haiku Post-call (decided AFTER iteration-1 Haiku, only when stop_reason == "end_turn"): - Low-confidence marker in Haiku response ("I'm not sure", "I don't have enough", etc. — env-tunable) → escalate - Heuristic: response shorter than 50 chars when input longer than 200 chars → escalate Escalation re-issues iteration 1 to Opus with Haiku's response woven into the conversation history (assistant turn) + a follow-up user turn asking Opus to review/improve. Reuses the same system + tools blocks so the prompt cache stays warm. BOTH the Haiku attempt + Opus re-issue are billed (the cost of escalation); telemetry records two separate calls. # Cost-downshift harmonization agent/cost_downshift.py is left UNTOUCHED. It governs **Sea_ Ticket queue** decisions (defer vs run; criticality clamping) which is a separate substrate-side concern. The router runs on a different code path (engine-side / Joshua-DM reply path) and reads cost_rung as one of its inputs (the Layer 2 backstop). Both name "downshift" in different domain contexts; both co-exist cleanly. Engine-side: RUNG_MODEL_MAP at anthropic_engine.py:115 is DEPRECATED — the router now owns model selection. The const is preserved (mapped uniformly to MODEL_HAIKU) for any downstream consumer that imports it; the engine no longer consults it. # Telemetry wiring (first production record_call call site) Engine emits ``get_telemetry().record_call(...)`` per API call: - iteration 1 → route = source-mapped (slack_dm / email_inbound / mcp_tool / unknown) - iteration ≥ 2 → route = ROUTE_TOOL_LOOP_ITERATION - escalated_to_opus = True for every earning-signal path AND every post-call escalation re-issue Cost-estimate computed per-call via ``agent.usage_pricing.estimate_usage_cost`` with the canonical PricingEntry rates. Best-effort: any telemetry failure logs at DEBUG and the hot reasoning path continues. # Env-tunable triggers KORA_FORCE_OPUS — operator defensive switch KORA_OPUS_TRIGGER_PATTERNS — comma-separated regex list (overrides DEFAULT_DECISION_ PATTERNS — 7 entries) KORA_OPUS_PREFIX — default "/opus" KORA_HAIKU_LOW_CONFIDENCE_PATTERNS — comma-separated regex (overrides 6-entry default) # Tests tests/kora_cli/router/test_cost_router.py — 50 router unit tests covering every decision-tree branch + every env override + every precedence boundary (force_opus vs cost_clamp, iteration_2 vs opus_prefix, etc.). tests/kora_cli/reasoning/test_anthropic_engine_router.py — 13 engine integration tests: - default Haiku on simple message - decision-language → Opus - /opus prefix → Opus + prefix stripped from SDK messages - warn_75 → Haiku (cost clamp) - hard_stop → cost_ladder_halted, no SDK call - iteration 2 → Opus - low-confidence Haiku → Opus re-issue with Haiku response in messages array - high-confidence Haiku → only 1 SDK call - tool_use stop_reason doesn't trigger post-call escalation - Opus re-issue reuses identical system + tools blocks (cache warmth preserved) - telemetry record_call invoked per API call with correct route + escalated_to_opus flag Pre-existing tests updated: - test_normal_rung_selects_opus → renamed test_normal_rung_defaults_to_haiku (post-flip behavior) - test_warn_75_rung_selects_sonnet → renamed test_warn_75_rung_clamps_to_haiku - test_unknown_rung_defaults_to_opus_and_continues → renamed test_unknown_rung_defaults_to_haiku (no more WARN log; router treats unknown as no-clamp + falls to default) Pre-existing test-spy fixtures updated to accept the ``route`` + ``escalated_to_opus`` kwargs that ``holder.record_inference`` gained in #161 (gap from #161 that landed without test-fixture updates; addressed here to keep the regression green): - tests/kora_cli/handlers/test_slack_dm_reply.py::_spy - tests/kora_cli/reasoning/test_anthropic_engine_caching.py ::_capture # Regression 591/591 reasoning + handlers + router + listeners green serially. Full repo xdist: 9447 passed, 43 failed identical to baseline (test_anthropic_adapter / test_backup / test_config / test_gateway_* / test_web_server* / test_kanban_db / test_list_picker_providers / test_model_switch_* / test_startup_plugin_gating / test_web_server_cron_profiles). Zero failures in router, reasoning, or handlers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 24, 2026
rafe-walker
added a commit
that referenced
this pull request
May 24, 2026
…it_pool_usd (schema v3) (#169) Snapshot schema_version 2→3. Adds spent_to_date_usd + credit_pool_usd to cost_ladder section. KORA_CREDIT_POOL_USD env override (default $200 per reference-anthropic-sdk-billing-split); malformed/non-positive values warn + fall back. Bonus: model_default now resolves via KR-HAIKU-ROUTER (#165) DEFAULT_HAIKU_MODEL — router-independent of cost holder, so PR #157's 'unknown' placeholder for that field is fully retired. Unblocks CC#2 follow-on (CostCardBody shift to snapshot — sub-ms read, decoupled from cost-holder boot windows, lag bounded to 5-min cron cadence). /api/cost-telemetry retained as live source for pre-decision reads. 47 snapshot tests + 159-test cross-bucket regression + ruff clean.
Merged
13 tasks
rafe-walker
added a commit
that referenced
this pull request
May 24, 2026
…ck R3-2 Phase C completion (#189) Substantial paired bundle. Deliverable 1 — New local Hermes hook post_llm_call_can_reissue at agent/conversation_loop.py. First-non-None override semantics matching #172/#181 family. Anti-loop safety: at most one reissue per iteration. Backward compat: post_api_request observer sees FINAL (post-reissue) response only. Deliverable 2 — kora_hermes_plugin/haiku_router/ following #185 template. Plugin consumes should_escalate_post_call (from cost_ladder/selector.py since #185 but no caller until now). Implements parallel-Claude pattern from R3: low-confidence Haiku response → reissue to Opus with Haiku response in messages context. record_inference fires twice per iteration (Haiku + Opus reissue) with escalated_to_opus tagged correctly. Sample trace: Haiku reply uncertain (i"m not sure...) → escalates → Opus response includes Haiku context → Opus confirms terser. Two consecutive record_inference calls, only second tagged escalated_to_opus=True. What this completes: 6 of 7 plugin extractions (KR-PLUGIN-IDENTITY still deferred per Lock R3-2); all KR-HAIKU-ROUTER #165 escalation paths now functional end-to-end; Lock R3-2 Phase C closed. Two non-blocking follow-ups flagged in PR body: escalation_reason as structured telemetry field (one-line CostStateHolder.record_inference bump); per-call api_call_count accounting (transparent-upgrade vs per-call — currently transparent). 125 directly-affected tests green; 56 broader-suite failures verified pre-existing on base (fastapi/blake3/HERMES_HOME environmental, not regressions).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Per Council R3 Lock R3-3 — the cost-ladder FLIP. The reasoning engine no longer defaults to Opus with reactive downshift; it defaults to Haiku 4.5 and escalates to Opus 4.7 ONLY when an earning signal fires. Cost-ladder rung stays as Layer 2 defensive backstop.
Earning signals — precedence order
Post-call (only on iteration 1 with stop_reason=end_turn):
Escalation rate — signal-by-signal expectations
Net: roughly 50-60% of Joshua's inbound DM volume hits Haiku at $0.10/M-read vs Opus $5.00/M, with the remainder split between short-circuit ($0) and earned Opus.
cost_downshift harmonization
agent/cost_downshift.pyis UNTOUCHED. It governs substrate Sea_Ticket queue decisions (defer-vs-run; criticality clamping) — a separate domain from the engine-side Joshua-DM reply path. The router readscost_rungas one of its inputs (the Layer 2 backstop) but does NOT call intocost_downshift. Both can co-exist; both use the word "downshift" in different domain contexts (substrate-tier vs LLM-tier).Engine-side:
RUNG_MODEL_MAPatanthropic_engine.py:115is DEPRECATED — the router now owns model selection. The const is preserved (mapped uniformly toMODEL_HAIKU) for any downstream consumer that imports it, but the engine no longer consults it.K-DG verification
claude-opus-4-7(verified via grep ofMODEL_OPUSconstant +RUNG_MODEL_MAPtable atanthropic_engine.py:108-118). Used verbatim inDEFAULT_OPUS_MODELfor cache-key stability.claude-haiku-4-5-20251001(verified). Long form preserved inDEFAULT_HAIKU_MODELfor cache-key stability with already-warm caches.record_callsignature: takesroute, model, canonical_usage, cost_estimate_usd, escalated_to_opus(verified atcost_telemetry.py:215).agent/cost_downshift.py: returnsDownshiftDecisionfor Sea_Ticket runners — different domain; no overlap.Env-tunable triggers
KORA_FORCE_OPUStrueforces every call to OpusKORA_OPUS_TRIGGER_PATTERNSKORA_OPUS_PREFIX/opusKORA_HAIKU_LOW_CONFIDENCE_PATTERNSTest plan
route+escalated_to_opuskwargsholder.record_inferencegained in feat(kora): KR-CHEAP-COST-TELEMETRY — per-route counters (R3-4 #10) #161 (feat(kora): KR-CHEAP-COST-TELEMETRY — per-route counters (R3-4 #10) #161 didn't update the fixtures — fixed here)/opusprefix stripping verified by walking the SDK call's messages array🤖 Generated with Claude Code