This repository was archived by the owner on May 26, 2026. It is now read-only.
feat(kora): KR-REASONING-ROUTE-THROUGH-GATEWAY-ST2B — tool-bridge#181
Merged
rafe-walker merged 1 commit intoMay 24, 2026
Conversation
ST2B of the route-through. ST2 (#178) shipped toolless wire-up; this bucket bundles the local Hermes extension + Kora plugin consumer + agent.tools population so the toggle ON path has full tool capability. After ST2B + 24-48h operator burn-in validates parity with bypass, ST3 (tiny PR) flips KORA_REASONING_USE_GATEWAY default to ON. # Hook contract: pre_tool_call_can_provide_result New local Hermes hook (added to VALID_HOOKS; fires in model_tools.handle_function_call between the existing pre_tool_call block-check and Hermes's default registry.dispatch). Kwargs: tool_name, args, task_id, session_id, tool_call_id Return: - None / non-dict / missing "result" key → no-op, fall through to other plugins, then Hermes default dispatch - {"result": "<tool_result_str>"} → short-circuits Hermes dispatch; plugin string becomes the tool result - First non-None result wins (matches existing override-shape semantics in pre_api_request_mutable + pre_tool_list_finalized) Failure handling: any hook exception → caught + DEBUG-logged + fall-through to Hermes default. Fail-safe. Backward compat: existing pre_tool_call (block-check) + post_tool_call audit + transform_tool_result fire UNCHANGED on the same code path; new hook slots between block-check and Hermes default dispatch without altering observer ordering. # Kora plugin tool-bridge consumer plugins/kora_hermes adds 2 helpers: (a) _tool_bridge_provide_result — handler registered to the new hook. For tools in REASONING_TOOL_ALLOWLIST: dispatches via asyncio.run(execute_reasoning_tool(name, args)) + projects the Pydantic model → JSON string + returns {"result": json}. For non-Kora tools: returns None (Hermes default dispatch runs unchanged — Hermes-fork safety pin). Dispatch exceptions (e.g. substrate down) → returns {"result": json.dumps({"error": "kora_tool_dispatch_error: <ClassName>: <msg>"})} so the reasoning loop sees an is_error tool_result + can recover. (b) get_kora_tools_for_agent — converts Kora's Anthropic- shaped reasoning tool descriptors to Hermes/OpenAI shape ({"type": "function", "function": {name, description, parameters}}) for agent.tools population. Empty list on registry failure → engine falls back to toolless ST2 posture. # Engine wiring change (replacing ST2's agent.tools = []) kora_cli/reasoning/anthropic_engine._respond_via_gateway now populates agent.tools from get_kora_tools_for_agent() + populates agent.valid_tool_names from the returned tool names. Falls back to agent.tools = [] on import / call failure (ST2 toolless posture preserved as the safety net). # Sample tool-use trace User → "what's my burn?" → engine._respond_via_gateway constructs AIAgent (max_iter=5, haiku default, anthropic_messages mode) → agent.tools = [{type:function, function:{name: "kora__get_operational_state", description:..., parameters: {...}}}, ...] (5 Kora reasoning tools) → agent.route = "slack_dm" → agent.run_conversation(user_message, system_message) via asyncio.to_thread (sync Hermes loop offloaded) → conversation_loop calls Claude → Claude returns tool_use block for "kora__get_operational_state" → conversation_loop dispatches via model_tools.handle_ function_call("kora__get_operational_state", {}, ...) → handle_function_call fires pre_tool_call (block-check) → handle_function_call fires NEW pre_tool_call_can_provide_result hook → kora_hermes plugin's _tool_bridge_provide_result handler fires (tool IS in REASONING_TOOL_ALLOWLIST) → handler calls asyncio.run(execute_reasoning_tool(...)) → returns {"result": '{"primary_state":"ready",...}'} → handle_function_call short-circuits Hermes default registry.dispatch (verified via spy in tests) → post_tool_call audit fires + transform_tool_result fires (UNCHANGED on the same code path) → conversation_loop continues with the tool_result; LLM produces final answer → run_conversation returns dict {final_response, model, tokens, ...} → engine projects via _project_gateway_result → ResponseResult → handler renders the reply # Tests tests/plugins/test_kora_hermes_plugin_st2b.py — 17 new tests: - Hook surface: in VALID_HOOKS + register_hook accepts without unknown-hook warning - get_kora_tools_for_agent: Hermes-shape conversion + all names within reasoning allowlist + empty-on-registry-failure fall-back - _tool_bridge_provide_result: non-Kora tool returns None (Hermes-fork safety); Kora tool dispatches + returns JSON; dispatch exception returns is_error envelope; non-Pydantic result falls back to json.dumps(default=str) - handle_function_call: new hook fires + short-circuits registry.dispatch when plugin provides result; falls through when plugin returns None; fail-safe on plugin exception - _respond_via_gateway: agent.tools populated from registry; agent.valid_tool_names tracks names; falls back to [] when tool-bridge import fails - Sample tool-use trace (Kora tool via bridge): no Hermes dispatch; bridge result returned - Sample tool-use trace (non-Kora tool fall-through): Hermes dispatch runs; bridge stays out — fork users safe - Bypass-path regression (toggle OFF): no AIAgent ctor; no behavior change Updated: - tests/test_model_tools.py — pre-existing hook-sequence pin extended to account for new hook (pre_tool_call → pre_tool_call_can_provide_result → post_tool_call → transform_tool_result) - tests/plugins/test_kora_hermes_plugin_st2.py — ST2's "agent.tools == []" pin replaced with the new Hermes- shape assertion (ST2B populates tools) - tests/plugins/test_kora_hermes_plugin.py — ST1's "six hooks" pin extended to seven (added new tool-bridge hook) # Documentation update kora_docs/14_research/hermes_local_extensions_2026-05-23.md gains Extension 5 (the new hook) + its upstream-PR readiness notes. Battle-test gap: ST3 default-flip + 24-48h burn-in is the natural integration test. After ST3 lands, the hook is upstreamable. # Pre-existing cross-test interference (NOT introduced by ST2B) `tests/test_transform_llm_output_hook.py::test_hook_receives_ expected_kwargs` + `tests/test_transform_tool_result_hook.py:: test_transform_tool_result_integration_with_real_plugin` fail when run AFTER `tests/plugins/test_kora_hermes_plugin_st2.py` (PR #178). Reproduced cleanly with that two-file ordering; both pass in isolation. Pre-existing PR #178 test-fixture pollution affecting user-plugin discovery — flagged for follow-on `KR-TEST-ISOLATION-ST2-POLLUTION` since it's a test-only concern (production-code uncontaminated) and fixing needs PR #178 author's context on what state their tests are mutating. Did NOT block ST2B since both tests pass in isolation and the pollution is orthogonal to ST2B's surface. # Regression 657/657 in-scope tests pass serially (reasoning + handlers + listeners + agent + plugins + model_tools). Full repo xdist: 10746 passed, 71 failed. Same baseline as ST2 (48 established baseline flakes + 22 pre-existing tests/ plugins/{memory,web} env-dep failures from missing blake3 extras + 1 pre-existing PR #178 cross-test interference described above). Zero new ST2B regressions. # ST3 burn-in start recommendation timing Recommend burn-in starts **immediately after ST2B merges**. Reasoning: - ST2B closes the tool-use gap that blocked ST3 in ST2's recommendation. Toggle ON now provides parity with bypass for tool-using flows. - The kora_hermes plugin is opt-in via plugins.enabled config — operator opts in WITHOUT flipping the engine toggle to validate plugin discovery + tool-bridge wiring in dev first. - After plugin opt-in burns in for ~12h cleanly, operator flips KORA_REASONING_USE_GATEWAY=true in Doppler for 24-48h. - Watch metrics: cost_telemetry per-route, audit JSONL for tool-call attribution, slack DM outbound JSONL for any "gateway_*" error codes. - If parity holds, ST3 lands (env flip to default). - If anomalies surface, ST3 holds + targeted fix lands as KR-REASONING-ROUTE-THROUGH-GATEWAY-ST2C. Total ST3-ready estimate: **48-72h after ST2B merges** (12h plugin opt-in burn-in + 24-48h toggle ON burn-in + small buffer for ST3 PR review). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Merged
13 tasks
rafe-walker
added a commit
that referenced
this pull request
May 24, 2026
…ck R3-2 Phase C completion (#189) Substantial paired bundle. Deliverable 1 — New local Hermes hook post_llm_call_can_reissue at agent/conversation_loop.py. First-non-None override semantics matching #172/#181 family. Anti-loop safety: at most one reissue per iteration. Backward compat: post_api_request observer sees FINAL (post-reissue) response only. Deliverable 2 — kora_hermes_plugin/haiku_router/ following #185 template. Plugin consumes should_escalate_post_call (from cost_ladder/selector.py since #185 but no caller until now). Implements parallel-Claude pattern from R3: low-confidence Haiku response → reissue to Opus with Haiku response in messages context. record_inference fires twice per iteration (Haiku + Opus reissue) with escalated_to_opus tagged correctly. Sample trace: Haiku reply uncertain (i"m not sure...) → escalates → Opus response includes Haiku context → Opus confirms terser. Two consecutive record_inference calls, only second tagged escalated_to_opus=True. What this completes: 6 of 7 plugin extractions (KR-PLUGIN-IDENTITY still deferred per Lock R3-2); all KR-HAIKU-ROUTER #165 escalation paths now functional end-to-end; Lock R3-2 Phase C closed. Two non-blocking follow-ups flagged in PR body: escalation_reason as structured telemetry field (one-line CostStateHolder.record_inference bump); per-call api_call_count accounting (transparent-upgrade vs per-call — currently transparent). 125 directly-affected tests green; 56 broader-suite failures verified pre-existing on base (fastapi/blake3/HERMES_HOME environmental, not regressions).
9 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ST2B of the route-through. ST2 (#178) shipped toolless wire-up; this bucket bundles the local Hermes extension + Kora plugin consumer + agent.tools population so the toggle ON path has full tool capability.
Hook contract
New local Hermes hook:
pre_tool_call_can_provide_resultmodel_tools.handle_function_call— between existingpre_tool_callblock-check and Hermes's defaultregistry.dispatchtool_name,args,task_id,session_id,tool_call_id{"result": "<tool_result_str>"}→ short-circuits Hermes dispatchNone/ non-dict / missing"result"key → other plugins, then Hermes defaultresultwins (matches existing override-shape semantics inpre_api_request_mutable+pre_tool_list_finalized)pre_tool_call(block-check) +post_tool_callaudit +transform_tool_resultfire UNCHANGED on the same code pathSample tool-use trace
Verified by
test_sample_tool_use_trace_kora_tool_via_bridge: invokeshandle_function_callend-to-end with a spy onregistry.dispatchconfirming the bridge short-circuit (Hermes dispatch never fires for Kora tools).Verified by
test_sample_tool_use_trace_non_kora_tool_falls_through:write_file(Hermes-native tool, NOT in Kora's allowlist) → bridge returns None → Hermes default dispatch runs. Hermes-fork users with the plugin loaded see no regression on their own tools.Files changed
kora_cli/plugins.pypre_tool_call_can_provide_result(16 lines docstring)model_tools.pyhandle_function_callbetween block-check andregistry.dispatch(35 lines)plugins/kora_hermes/__init__.py_tool_bridge_provide_resulthandler +_is_kora_reasoning_toolhelper +get_kora_tools_for_agentconverter +register()adds the 7th hookkora_cli/reasoning/anthropic_engine.py_respond_via_gatewaypopulatesagent.toolsfromget_kora_tools_for_agent()(replaces ST2's= []); fall-back to toolless on registry failurekora_docs/14_research/hermes_local_extensions_2026-05-23.mdtests/plugins/test_kora_hermes_plugin_st2b.pytests/test_model_tools.pytests/plugins/test_kora_hermes_plugin_st2.pyagent.tools == []pin replaced with Hermes-shape assertiontests/plugins/test_kora_hermes_plugin.pyST3 burn-in start recommendation timing
Recommend burn-in starts immediately after ST2B merges.
Sequence:
plugins.enabledconfig (in dev / staging first)KORA_REASONING_USE_GATEWAYstill OFF (bypass path still drives production)KORA_REASONING_USE_GATEWAY=truein Doppler for 24-48hgateway_*error codesTotal ST3-ready estimate: 48-72h after ST2B merges (12h plugin opt-in + 24-48h toggle ON + small buffer for ST3 PR review).
Test plan
pytest tests/plugins/test_kora_hermes_plugin_st2b.pytest_sample_tool_use_trace_non_kora_tool_falls_throughpre_tool_call(block-check) +post_tool_callaudit +transform_tool_resultcontinue to fire on the same path; observer ordering preservedPre-existing cross-test interference (flagged, not caused by ST2B)
tests/test_transform_llm_output_hook.py::test_hook_receives_expected_kwargs+tests/test_transform_tool_result_hook.py::test_transform_tool_result_integration_with_real_pluginfail when run AFTERtests/plugins/test_kora_hermes_plugin_st2.py(PR #178). Reproduced cleanly with that two-file ordering; both pass in isolation. Pre-existing PR #178 test-fixture pollution affecting user-plugin discovery — flagged for follow-onKR-TEST-ISOLATION-ST2-POLLUTIONsince fixing needs PR #178 author's context. Did NOT block ST2B since both tests pass in isolation and the pollution is orthogonal to ST2B's surface.🤖 Generated with Claude Code