Skip to content
This repository was archived by the owner on May 26, 2026. It is now read-only.

feat(kora): KR-REASONING-ROUTE-THROUGH-GATEWAY-ST2B — tool-bridge#181

Merged
rafe-walker merged 1 commit into
feature/phase2-upgradesfrom
feat/kora-KR-REASONING-ROUTE-THROUGH-GATEWAY-ST2B
May 24, 2026
Merged

feat(kora): KR-REASONING-ROUTE-THROUGH-GATEWAY-ST2B — tool-bridge#181
rafe-walker merged 1 commit into
feature/phase2-upgradesfrom
feat/kora-KR-REASONING-ROUTE-THROUGH-GATEWAY-ST2B

Conversation

@rafe-walker

Copy link
Copy Markdown
Owner

Summary

ST2B of the route-through. ST2 (#178) shipped toolless wire-up; this bucket bundles the local Hermes extension + Kora plugin consumer + agent.tools population so the toggle ON path has full tool capability.

Hook contract

New local Hermes hook: pre_tool_call_can_provide_result

Aspect Value
Site model_tools.handle_function_call — between existing pre_tool_call block-check and Hermes's default registry.dispatch
Kwargs tool_name, args, task_id, session_id, tool_call_id
Return (success) {"result": "<tool_result_str>"} → short-circuits Hermes dispatch
Return (fall-through) None / non-dict / missing "result" key → other plugins, then Hermes default
Wins-policy First non-None result wins (matches existing override-shape semantics in pre_api_request_mutable + pre_tool_list_finalized)
Failure Plugin exception → caught + DEBUG-logged → fall-through. Fail-safe.
Backward compat Existing pre_tool_call (block-check) + post_tool_call audit + transform_tool_result fire UNCHANGED on the same code path

Sample tool-use trace

User → "what's my burn?"
  ↓ engine._respond_via_gateway constructs AIAgent (max_iter=5, anthropic_messages mode)
  ↓ agent.tools = [<5 Kora reasoning tools in Hermes/OpenAI shape>]
  ↓ agent.route = "slack_dm"
  ↓ agent.run_conversation(user_message, system_message) [asyncio.to_thread]
  ↓ conversation_loop → Claude → tool_use block for "kora__get_operational_state"
  ↓ model_tools.handle_function_call("kora__get_operational_state", {}, ...)
  ↓ pre_tool_call block-check fires (no-op for Kora tools — bypass-loop hadn't checked either)
  ↓ NEW HOOK: pre_tool_call_can_provide_result fires
  ↓ kora_hermes._tool_bridge_provide_result handler:
      - Tool IS in REASONING_TOOL_ALLOWLIST → asyncio.run(execute_reasoning_tool(...))
      - Returns {"result": '{"primary_state":"ready","claim_permission":"normal",...}'}
  ↓ handle_function_call short-circuits Hermes registry.dispatch (verified via test spy)
  ↓ post_tool_call audit + transform_tool_result fire UNCHANGED
  ↓ conversation_loop continues with tool_result → Claude → final answer text
  ↓ run_conversation returns {final_response: "...", model: "...", tokens: {...}, ...}
  ↓ engine projects via _project_gateway_result → ResponseResult
  ↓ handler renders the reply

Verified by test_sample_tool_use_trace_kora_tool_via_bridge: invokes handle_function_call end-to-end with a spy on registry.dispatch confirming the bridge short-circuit (Hermes dispatch never fires for Kora tools).

Verified by test_sample_tool_use_trace_non_kora_tool_falls_through: write_file (Hermes-native tool, NOT in Kora's allowlist) → bridge returns None → Hermes default dispatch runs. Hermes-fork users with the plugin loaded see no regression on their own tools.

Files changed

File Change
kora_cli/plugins.py +1 VALID_HOOKS entry: pre_tool_call_can_provide_result (16 lines docstring)
model_tools.py New hook invocation in handle_function_call between block-check and registry.dispatch (35 lines)
plugins/kora_hermes/__init__.py _tool_bridge_provide_result handler + _is_kora_reasoning_tool helper + get_kora_tools_for_agent converter + register() adds the 7th hook
kora_cli/reasoning/anthropic_engine.py _respond_via_gateway populates agent.tools from get_kora_tools_for_agent() (replaces ST2's = []); fall-back to toolless on registry failure
kora_docs/14_research/hermes_local_extensions_2026-05-23.md Extension 5 inventory + upstream-PR readiness notes
tests/plugins/test_kora_hermes_plugin_st2b.py NEW — 17 ST2B tests (hook surface + tool-bridge handler + tool population + sample traces + bypass regression)
tests/test_model_tools.py Hook-sequence pin extended to include the new hook
tests/plugins/test_kora_hermes_plugin_st2.py ST2's agent.tools == [] pin replaced with Hermes-shape assertion
tests/plugins/test_kora_hermes_plugin.py ST1's "6 hooks" pin extended to 7

ST3 burn-in start recommendation timing

Recommend burn-in starts immediately after ST2B merges.

Sequence:

  1. ST2B merges → operator opts in to the kora_hermes plugin via plugins.enabled config (in dev / staging first)
  2. ~12h plugin opt-in burn-in — verify plugin discovery + hook registration + no spurious behavior change with KORA_REASONING_USE_GATEWAY still OFF (bypass path still drives production)
  3. Operator flips KORA_REASONING_USE_GATEWAY=true in Doppler for 24-48h
  4. Watch metrics: cost_telemetry per-route counters, audit JSONL tool-call attribution, slack DM outbound JSONL for any gateway_* error codes
  5. If parity holdsST3 lands (env-flip PR to default OFF→ON)
  6. If anomalies surface → ST3 holds + targeted fix lands as KR-REASONING-ROUTE-THROUGH-GATEWAY-ST2C

Total ST3-ready estimate: 48-72h after ST2B merges (12h plugin opt-in + 24-48h toggle ON + small buffer for ST3 PR review).

Test plan

  • 17 new ST2B tests pass: pytest tests/plugins/test_kora_hermes_plugin_st2b.py
  • 657/657 in-scope serial regression green (reasoning + handlers + listeners + agent + plugins + model_tools)
  • Full repo xdist: 10746 passed, 71 failed. Same baseline as ST2 (48 established baseline + 22 pre-existing tests/plugins/{memory,web} env-dep failures + 1 pre-existing PR feat(kora): KR-REASONING-ROUTE-THROUGH-GATEWAY-ST2 — actual wire-up #178 cross-test interference described below). Zero new ST2B regressions.
  • Hermes-fork safety: bridge handler returns None for non-Kora tools; tested via test_sample_tool_use_trace_non_kora_tool_falls_through
  • Fail-safe: bridge dispatch exception returns is_error envelope; hook-fire exception falls through to Hermes default
  • Backward compat: pre_tool_call (block-check) + post_tool_call audit + transform_tool_result continue to fire on the same path; observer ordering preserved

Pre-existing cross-test interference (flagged, not caused by ST2B)

tests/test_transform_llm_output_hook.py::test_hook_receives_expected_kwargs + tests/test_transform_tool_result_hook.py::test_transform_tool_result_integration_with_real_plugin fail when run AFTER tests/plugins/test_kora_hermes_plugin_st2.py (PR #178). Reproduced cleanly with that two-file ordering; both pass in isolation. Pre-existing PR #178 test-fixture pollution affecting user-plugin discovery — flagged for follow-on KR-TEST-ISOLATION-ST2-POLLUTION since fixing needs PR #178 author's context. Did NOT block ST2B since both tests pass in isolation and the pollution is orthogonal to ST2B's surface.

🤖 Generated with Claude Code

ST2B of the route-through. ST2 (#178) shipped toolless wire-up;
this bucket bundles the local Hermes extension + Kora plugin
consumer + agent.tools population so the toggle ON path has
full tool capability. After ST2B + 24-48h operator burn-in
validates parity with bypass, ST3 (tiny PR) flips
KORA_REASONING_USE_GATEWAY default to ON.

# Hook contract: pre_tool_call_can_provide_result

New local Hermes hook (added to VALID_HOOKS; fires in
model_tools.handle_function_call between the existing
pre_tool_call block-check and Hermes's default
registry.dispatch).

Kwargs:
  tool_name, args, task_id, session_id, tool_call_id

Return:
  - None / non-dict / missing "result" key → no-op, fall
    through to other plugins, then Hermes default dispatch
  - {"result": "<tool_result_str>"} → short-circuits Hermes
    dispatch; plugin string becomes the tool result
  - First non-None result wins (matches existing override-shape
    semantics in pre_api_request_mutable + pre_tool_list_finalized)

Failure handling: any hook exception → caught + DEBUG-logged
+ fall-through to Hermes default. Fail-safe.

Backward compat: existing pre_tool_call (block-check) +
post_tool_call audit + transform_tool_result fire UNCHANGED on
the same code path; new hook slots between block-check and
Hermes default dispatch without altering observer ordering.

# Kora plugin tool-bridge consumer

plugins/kora_hermes adds 2 helpers:

(a) _tool_bridge_provide_result — handler registered to the
    new hook. For tools in REASONING_TOOL_ALLOWLIST:
    dispatches via asyncio.run(execute_reasoning_tool(name,
    args)) + projects the Pydantic model → JSON string +
    returns {"result": json}. For non-Kora tools: returns None
    (Hermes default dispatch runs unchanged — Hermes-fork
    safety pin).

    Dispatch exceptions (e.g. substrate down) → returns
    {"result": json.dumps({"error": "kora_tool_dispatch_error:
    <ClassName>: <msg>"})} so the reasoning loop sees an
    is_error tool_result + can recover.

(b) get_kora_tools_for_agent — converts Kora's Anthropic-
    shaped reasoning tool descriptors to Hermes/OpenAI shape
    ({"type": "function", "function": {name, description,
    parameters}}) for agent.tools population. Empty list on
    registry failure → engine falls back to toolless ST2
    posture.

# Engine wiring change (replacing ST2's agent.tools = [])

kora_cli/reasoning/anthropic_engine._respond_via_gateway now
populates agent.tools from get_kora_tools_for_agent() +
populates agent.valid_tool_names from the returned tool names.
Falls back to agent.tools = [] on import / call failure (ST2
toolless posture preserved as the safety net).

# Sample tool-use trace

User → "what's my burn?"
  → engine._respond_via_gateway constructs AIAgent (max_iter=5,
    haiku default, anthropic_messages mode)
  → agent.tools = [{type:function, function:{name:
    "kora__get_operational_state", description:..., parameters:
    {...}}}, ...] (5 Kora reasoning tools)
  → agent.route = "slack_dm"
  → agent.run_conversation(user_message, system_message) via
    asyncio.to_thread (sync Hermes loop offloaded)
  → conversation_loop calls Claude → Claude returns tool_use
    block for "kora__get_operational_state"
  → conversation_loop dispatches via model_tools.handle_
    function_call("kora__get_operational_state", {}, ...)
  → handle_function_call fires pre_tool_call (block-check)
  → handle_function_call fires NEW
    pre_tool_call_can_provide_result hook
  → kora_hermes plugin's _tool_bridge_provide_result handler
    fires (tool IS in REASONING_TOOL_ALLOWLIST)
  → handler calls asyncio.run(execute_reasoning_tool(...))
  → returns {"result": '{"primary_state":"ready",...}'}
  → handle_function_call short-circuits Hermes default
    registry.dispatch (verified via spy in tests)
  → post_tool_call audit fires + transform_tool_result fires
    (UNCHANGED on the same code path)
  → conversation_loop continues with the tool_result; LLM
    produces final answer
  → run_conversation returns dict {final_response, model,
    tokens, ...}
  → engine projects via _project_gateway_result → ResponseResult
  → handler renders the reply

# Tests

tests/plugins/test_kora_hermes_plugin_st2b.py — 17 new tests:

  - Hook surface: in VALID_HOOKS + register_hook accepts
    without unknown-hook warning
  - get_kora_tools_for_agent: Hermes-shape conversion + all
    names within reasoning allowlist + empty-on-registry-failure
    fall-back
  - _tool_bridge_provide_result: non-Kora tool returns None
    (Hermes-fork safety); Kora tool dispatches + returns
    JSON; dispatch exception returns is_error envelope;
    non-Pydantic result falls back to json.dumps(default=str)
  - handle_function_call: new hook fires + short-circuits
    registry.dispatch when plugin provides result; falls
    through when plugin returns None; fail-safe on plugin
    exception
  - _respond_via_gateway: agent.tools populated from registry;
    agent.valid_tool_names tracks names; falls back to []
    when tool-bridge import fails
  - Sample tool-use trace (Kora tool via bridge): no Hermes
    dispatch; bridge result returned
  - Sample tool-use trace (non-Kora tool fall-through):
    Hermes dispatch runs; bridge stays out — fork users safe
  - Bypass-path regression (toggle OFF): no AIAgent ctor; no
    behavior change

Updated:
  - tests/test_model_tools.py — pre-existing hook-sequence pin
    extended to account for new hook (pre_tool_call →
    pre_tool_call_can_provide_result → post_tool_call →
    transform_tool_result)
  - tests/plugins/test_kora_hermes_plugin_st2.py — ST2's
    "agent.tools == []" pin replaced with the new Hermes-
    shape assertion (ST2B populates tools)
  - tests/plugins/test_kora_hermes_plugin.py — ST1's
    "six hooks" pin extended to seven (added new tool-bridge
    hook)

# Documentation update

kora_docs/14_research/hermes_local_extensions_2026-05-23.md
gains Extension 5 (the new hook) + its upstream-PR readiness
notes. Battle-test gap: ST3 default-flip + 24-48h burn-in is
the natural integration test. After ST3 lands, the hook is
upstreamable.

# Pre-existing cross-test interference (NOT introduced by ST2B)

`tests/test_transform_llm_output_hook.py::test_hook_receives_
expected_kwargs` + `tests/test_transform_tool_result_hook.py::
test_transform_tool_result_integration_with_real_plugin` fail
when run AFTER `tests/plugins/test_kora_hermes_plugin_st2.py`
(PR #178). Reproduced cleanly with that two-file ordering;
both pass in isolation. Pre-existing PR #178 test-fixture
pollution affecting user-plugin discovery — flagged for
follow-on `KR-TEST-ISOLATION-ST2-POLLUTION` since it's a
test-only concern (production-code uncontaminated) and fixing
needs PR #178 author's context on what state their tests are
mutating. Did NOT block ST2B since both tests pass in isolation
and the pollution is orthogonal to ST2B's surface.

# Regression

657/657 in-scope tests pass serially (reasoning + handlers +
listeners + agent + plugins + model_tools).

Full repo xdist: 10746 passed, 71 failed. Same baseline as
ST2 (48 established baseline flakes + 22 pre-existing tests/
plugins/{memory,web} env-dep failures from missing blake3
extras + 1 pre-existing PR #178 cross-test interference
described above). Zero new ST2B regressions.

# ST3 burn-in start recommendation timing

Recommend burn-in starts **immediately after ST2B merges**.

Reasoning:
  - ST2B closes the tool-use gap that blocked ST3 in ST2's
    recommendation. Toggle ON now provides parity with bypass
    for tool-using flows.
  - The kora_hermes plugin is opt-in via plugins.enabled
    config — operator opts in WITHOUT flipping the engine
    toggle to validate plugin discovery + tool-bridge wiring
    in dev first.
  - After plugin opt-in burns in for ~12h cleanly, operator
    flips KORA_REASONING_USE_GATEWAY=true in Doppler for
    24-48h.
  - Watch metrics: cost_telemetry per-route, audit JSONL for
    tool-call attribution, slack DM outbound JSONL for any
    "gateway_*" error codes.
  - If parity holds, ST3 lands (env flip to default).
  - If anomalies surface, ST3 holds + targeted fix lands as
    KR-REASONING-ROUTE-THROUGH-GATEWAY-ST2C.

Total ST3-ready estimate: **48-72h after ST2B merges** (12h
plugin opt-in burn-in + 24-48h toggle ON burn-in + small
buffer for ST3 PR review).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rafe-walker rafe-walker merged commit 7dcc77f into feature/phase2-upgrades May 24, 2026
@rafe-walker rafe-walker deleted the feat/kora-KR-REASONING-ROUTE-THROUGH-GATEWAY-ST2B branch May 24, 2026 04:45
rafe-walker added a commit that referenced this pull request May 24, 2026
…ck R3-2 Phase C completion (#189)

Substantial paired bundle.

Deliverable 1 — New local Hermes hook post_llm_call_can_reissue at agent/conversation_loop.py. First-non-None override semantics matching #172/#181 family. Anti-loop safety: at most one reissue per iteration. Backward compat: post_api_request observer sees FINAL (post-reissue) response only.

Deliverable 2 — kora_hermes_plugin/haiku_router/ following #185 template. Plugin consumes should_escalate_post_call (from cost_ladder/selector.py since #185 but no caller until now). Implements parallel-Claude pattern from R3: low-confidence Haiku response → reissue to Opus with Haiku response in messages context. record_inference fires twice per iteration (Haiku + Opus reissue) with escalated_to_opus tagged correctly.

Sample trace: Haiku reply uncertain (i"m not sure...) → escalates → Opus response includes Haiku context → Opus confirms terser. Two consecutive record_inference calls, only second tagged escalated_to_opus=True.

What this completes: 6 of 7 plugin extractions (KR-PLUGIN-IDENTITY still deferred per Lock R3-2); all KR-HAIKU-ROUTER #165 escalation paths now functional end-to-end; Lock R3-2 Phase C closed.

Two non-blocking follow-ups flagged in PR body: escalation_reason as structured telemetry field (one-line CostStateHolder.record_inference bump); per-call api_call_count accounting (transparent-upgrade vs per-call — currently transparent).

125 directly-affected tests green; 56 broader-suite failures verified pre-existing on base (fastapi/blake3/HERMES_HOME environmental, not regressions).
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant