This repository was archived by the owner on May 26, 2026. It is now read-only.
feat(kora): KR-REASONING-ROUTE-THROUGH-GATEWAY-ST2 — actual wire-up#178
Merged
rafe-walker merged 1 commit intoMay 24, 2026
Conversation
ST2 of 2. Replaces ST1's NotImplementedError scaffold with the
actual AIAgent route-through. KORA_REASONING_USE_GATEWAY toggle
stays default OFF; ST3 will be a tiny PR flipping the default
after parity validates in production.
# What ST2 ships
(a) `_respond_via_gateway` implementation:
- Per-call AIAgent construction (one-shot reply pattern;
ctor is cheap)
- **max_iterations=5 PINNED** (matches Kora's existing
MAX_TOOL_USE_ITERATIONS; Hermes default 90 would
quintuple monthly cost — the spec's critical pin)
- agent.tools=[] override (toolless v1 route-through;
tool-bridge is explicit ST2B follow-on)
- agent.route set from _source_to_kora_route(message.source)
so kora_hermes plugin hooks fire correctly
- run_conversation runs sync; offloaded via
asyncio.to_thread so the daemon's event loop isn't blocked
- Refuse-paths (paused, hard_stop_100) preserved before
AIAgent construction — toggle is behavior-neutral on these
(b) `_project_gateway_result` helper at module level:
- Maps run_conversation's dict return → ResponseResult
dataclass (per agent/conversation_loop.py:4077-4102 shape)
- cache_write_tokens → cache_creation_input_tokens (name
differs; Hermes uses cache_write, Kora uses
cache_creation_input per Anthropic SDK usage object
naming)
- completed + interrupted → error field
(gateway_interrupted / gateway_incomplete / None)
- tools_used=[] (ST2B populates from messages history)
(c) kora_hermes plugin handlers — real wiring:
- `_pre_api_request_mutable`: calls
cost_router.select_model_pre_call (existing #165 logic)
with iteration from api_call_count + cost_rung from
CostStateHolder; wraps system + last tool with
cache_control: ephemeral via existing
_wrap_system_as_cacheable / _wrap_tools_as_cacheable
helpers (KR-CHEAP-PROMPT-CACHING #158 semantic).
Returns {"override": {model, system, tools}}. Failures
fail-safe: leave api_kwargs unchanged + WARN log.
- `_post_llm_call`: structured-log marker
[kora.gateway.post_llm_call] with route + model. Per-call
CanonicalUsage accumulation stays at the handler layer
(slack_dm_handler's _record_inference_to_cost_ladder),
same as bypass.
- `_pre_tool_call`, `_post_tool_call`, `_pre_tool_list_
finalized`, `_on_session_start` remain no-op gated on
_is_kora_call (await ST2B / dedicated buckets).
# AIAgent ctor kwarg mapping (the critical table)
| Kwarg | Value | Rationale |
|---|---|---|
| model | MODEL_HAIKU constant | Default; plugin's pre_api_request_mutable hook overrides per-call based on cost-router decision |
| provider | "anthropic" | Kora's only provider |
| api_mode | "anthropic_messages" | Anthropic native messages API (matches Kora's bypass) |
| max_iterations | **5** | PINNED to Kora's MAX_TOOL_USE_ITERATIONS; Hermes default 90 would quintuple monthly cost |
| max_tokens | self._max_output_tokens | Forwarded from engine config |
| quiet_mode | True | Daemon path; suppress print() to stdout |
| other ~35 kwargs | Hermes defaults | Acceptable for v1 — Kora doesn't customize session_id mgmt, fallback chains, providers_*, callbacks, etc. |
Post-construction overrides:
- agent.tools = [] # bypass Hermes auto-loaded toolset
- agent.valid_tool_names = set()
- agent.route = derived_route # threaded into hooks
# No new Hermes local extensions in this PR
The tool-bridge (Kora reasoning tools → Hermes tool dispatch)
WOULD need a new local extension (most plausibly a hook like
``pre_tool_call_can_provide_result`` that lets plugins short-
circuit Hermes's dispatch with a plugin-computed result). But
that design + impl is non-trivial AND lands cleanly in a
separate KR-HERMES-LOCAL-EXT-TOOL-BRIDGE bucket. Per spec §4
STOP-ASK on non-trivial extensions: deferred + flagged in PR
body so PM can ratify the design before implementation.
# Entry-point callers
NO caller-side updates needed in this bucket — the engine's
``_respond_via_gateway`` reads ``message.source`` and calls
``_source_to_kora_route`` internally. The 3 source-mapped
callers (slack_dm_handler, email_inbound_handler, MCP-driven)
already pass IncomingMessage with the right source; the engine
maps source → route automatically.
For explicit-route callers (probe_wake_consumer → probe_
investigation; cron → scheduled_task), today they fall through
to "" route (no-op in plugin) when the toggle is on. Future
KR-CALLER-ROUTE-EXPLICIT (small follow-on) can either extend
_source_to_kora_route's table OR plumb route as a constructor
arg through engine.respond. Out of scope for ST2; toggle stays
OFF in production so this doesn't matter today.
# Behavior parity test results
ALL parity tests with toggle ON pass:
- Cost-ladder default → Haiku (verified via plugin override
dict's "model" key)
- /opus prefix → Opus
- Decision-language → Opus
- Force-Opus env → Opus
- api_call_count >= 2 → Opus (iteration earning signal)
- Caching markers on system (content-block list) + last tool
- Empty tools list → no caching override on tools
- ResponseResult projection: text + model + tokens + cache
tokens + error mapping (interrupted / incomplete /
completed-cleanly)
- Refuse-paths: paused / hard_stop_100 short-circuit BEFORE
AIAgent construction (no spurious agent build)
- Route threading: slack_dm / email_inbound / mcp_tool all
propagate correctly to agent.route
- Exception in run_conversation → ResponseResult.error =
"gateway_exception:<ClassName>"
Bypass-path-unchanged tests (toggle OFF or unset):
- Default behavior runs the existing bypass cleanly
- No spurious AIAgent construction (fake_class.assert_not_called)
- ResponseResult shape identical to pre-ST2
# Deferred to follow-on buckets
(documented as work remaining BEYOND this ST2)
- **ST2B / KR-HERMES-LOCAL-EXT-TOOL-BRIDGE**: tool bridge
design + impl (likely a new pre_tool_call result-providing
extension, OR registering Kora reasoning tools as Hermes
tools via PluginContext.register_tool). Without this, the
toggle ON path has NO tool-use capability — acceptable for
parity-validation phase but blocks default flip.
- **KR-PLUGIN-CONSTITUTION**: pre_tool_call constitution
pre-screen wiring (depends on tool bridge).
- **KR-PLUGIN-TOOL-AUDIT**: post_tool_call audit emit wiring
(depends on tool bridge).
- **ST3**: flip KORA_REASONING_USE_GATEWAY default to ON
(after ST2B + parity validation in dev).
# ST3 flip timing recommendation
**Wait for ST2B** before flipping default. Reasoning:
- ST2 ships toolless route-through. With toggle ON, Kora's
reasoning loses tool-use capability — she can't fetch
operational state, check ledger, look up health, etc.
- That degrades reasoning quality on questions that need
tool data. Acceptable in test runs (operator can grep for
expected behavior); UNACCEPTABLE for Joshua's production
DMs.
- ST2B's tool bridge restores parity. After ST2B + a
24h-burn-in test (operator sends ~10 representative DMs),
ST3 flips default.
Recommended ST3 timing: 24-48h after ST2B merges. Not
immediate.
# Regression
616/616 in-scope (reasoning + handlers + listeners + agent +
plugins) pass serially.
Full repo xdist: 10598 passed, 70 failed. 48 are the
established baseline flakes carried forward from prior buckets
(test_anthropic_adapter + test_backup + test_config + test_
gateway_* + test_kanban_db + test_list_picker_providers +
test_model_switch_* + test_startup_plugin_gating + test_web_
server* family + the 5 pre-existing FE-snapshot pin drifts).
The +22 failures are in tests/plugins/{memory,web} — these
require optional install extras (blake3 for isokron) that
aren't in the shared test venv; pre-existing environment-
dependent failures, not introduced by ST2. Zero failures in
tests/plugins/test_kora_hermes_plugin*.py or in any code path
ST2 touches.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ST2 of 2. Replaces ST1's
NotImplementedErrorscaffold with the actualAIAgent.run_conversationroute-through.KORA_REASONING_USE_GATEWAYtoggle stays default OFF; ST3 (tiny follow-on) will flip the default after parity validates in production.AIAgent ctor mapping (the critical table)
modelMODEL_HAIKUconstantpre_api_request_mutablehook overrides per-call based on cost-router decisionprovider"anthropic"api_mode"anthropic_messages"max_iterations5MAX_TOOL_USE_ITERATIONS; Hermes default 90 would quintuple monthly cost (the spec's critical pin)max_tokensself._max_output_tokensquiet_modeTruePost-construction overrides:
agent.tools = []— bypass Hermes auto-loaded toolset (toolless v1; tool-bridge is ST2B)agent.valid_tool_names = set()agent.route = derived_route— threaded into kora_hermes plugin hooksNew local Hermes extensions added
None in this PR. The tool-bridge (Kora reasoning tools → Hermes tool dispatch) WOULD need a new local extension — most plausibly a hook like
pre_tool_call_can_provide_resultthat lets plugins short-circuit Hermes's dispatch with a plugin-computed result. But that design + impl is non-trivial AND lands cleanly in a separate KR-HERMES-LOCAL-EXT-TOOL-BRIDGE bucket. Per spec §4 STOP-ASK on non-trivial extensions: deferred + flagged here so PM can ratify the design before implementation.Behavior parity test results
All parity tests with toggle ON pass:
modelkey)/opusprefix → OpusKORA_FORCE_OPUSenv → Opusapi_call_count >= 2→ Opus (iteration earning signal)gateway_exception:<ClassName>errorBypass-path-unchanged tests (toggle OFF or unset):
ResponseResult projection (
run_conversationdict → dataclass)Per
agent/conversation_loop.py:4077-4102Hermes return shape:final_responsetextmodelmodel_used(with engine-side fallback)input_tokens/output_tokenscache_write_tokenscache_creation_input_tokens(name differs — Hermes uses cache_write, Kora uses cache_creation_input per Anthropic SDK usage object naming)cache_read_tokenscache_read_input_tokenscompleted+interruptederror(gateway_interrupted/gateway_incomplete/None)tools_used = [](ST2B populates from messages history)Deferred to follow-on buckets
Documented as work remaining beyond ST2:
pre_tool_call_can_provide_resultextension OR registering Kora reasoning tools as Hermes tools viaPluginContext.register_tool). Without this, the toggle ON path has NO tool-use capability — acceptable for parity-validation phase but blocks default flip.pre_tool_callconstitution pre-screen wiring (depends on tool bridge).post_tool_callaudit emit wiring (depends on tool bridge).KORA_REASONING_USE_GATEWAYdefault to ON (after ST2B + parity validation in dev).ST3 flip timing recommendation
Wait for ST2B before flipping default. Reasoning:
Recommended ST3 timing: 24-48h after ST2B merges. Not immediate.
Test plan
tests/plugins/test_kora_hermes_plugin_st2.pypass (cost-ladder routing + caching markers + e2e gateway path + refuse-paths + route threading + interrupt mapping + exception mapping + bypass unchanged)blake3install extras not in shared test venv). Zero failures intests/plugins/test_kora_hermes_plugin*.pyor in any code path ST2 touches.Files changed
kora_cli/reasoning/anthropic_engine.py_respond_via_gatewayimplementation +_project_gateway_resulthelperplugins/kora_hermes/__init__.py_pre_api_request_mutable(cost-router + caching) +_post_llm_call(structured-log marker);_current_cost_runghelpertests/plugins/test_kora_hermes_plugin.pytests/plugins/test_kora_hermes_plugin_st2.py🤖 Generated with Claude Code