Skip to content
This repository was archived by the owner on May 26, 2026. It is now read-only.

feat(kora): KR-HERMES-LOCAL-EXT-REISSUE + KR-HAIKU-ROUTER-PLUGIN — completes Lock R3-2 Phase C#189

Merged
rafe-walker merged 1 commit into
feature/phase2-upgradesfrom
feat/kora-KR-HERMES-LOCAL-EXT-REISSUE-AND-HAIKU-ROUTER-PLUGIN-PAIR
May 24, 2026
Merged

feat(kora): KR-HERMES-LOCAL-EXT-REISSUE + KR-HAIKU-ROUTER-PLUGIN — completes Lock R3-2 Phase C#189
rafe-walker merged 1 commit into
feature/phase2-upgradesfrom
feat/kora-KR-HERMES-LOCAL-EXT-REISSUE-AND-HAIKU-ROUTER-PLUGIN-PAIR

Conversation

@rafe-walker

Copy link
Copy Markdown
Owner

Summary

Batched pair (per [[feedback-batch-bigger-buckets]] + my own #188 recommendation). Two tightly-coupled deliverables: REISSUE adds the Hermes-side hook surface; HAIKU-ROUTER-PLUGIN consumes it. After this lands, 6 of 7 plugin extractions complete and ALL post-call escalation paths from KR-HAIKU-ROUTER (#165) are functional end-to-end.

Deliverable A — post_llm_call_can_reissue hook (KR-HERMES-LOCAL-EXT-REISSUE)

NEW local Hermes hook in agent/conversation_loop.py. Fires AFTER messages.create returns a valid response and BEFORE downstream normalize / post_api_request observer / tool dispatch. Plugins return {\"reissue_with\": <new api_kwargs>} to transparently re-call the API with modified kwargs; the re-issued response REPLACES the original for all downstream processing.

Override semantics (matches #172 / #181): iterate plugin returns, first non-None reissue_with wins. Subsequent returns ignored.

Anti-loop safety: re-issue fires at most ONCE per iteration. The hook is NOT re-fired against the re-issued response — if the new response would itself trigger another re-issue, the loop ignores it. Without this, two plugins could ping-pong escalations indefinitely.

Fail-safe: invoke_hook already try/except wraps each callback; an outer guard protects the re-issue call itself (transport raise, validation failure, etc). On any failure path, the original response is preserved.

Telemetry attribution: the re-issued response feeds record_inference_from_response(...., escalated_to_opus=True). The original Haiku call was already telemetered inside the retry-loop chokepoint with the default escalated_to_opus=False — both events land in the cost-ladder estimator so $-burn accounting stays accurate and cockpit panels can compute escalation rate per route.

Deliverable B — kora_hermes_plugin/haiku_router/ (KR-HAIKU-ROUTER-PLUGIN)

New sub-plugin following the #185 template:

```
kora_cli/reasoning/kora_hermes_plugin/haiku_router/
├── init.py — re-exports public surface
├── constants.py — MODEL_HAIKU / MODEL_OPUS (from cost_ladder), REISSUE_REVIEW_PROMPT, KORA_DISABLE_POST_CALL_ESCALATION
├── escalator.py — pure helpers: extract_first_text / extract_last_user_text / build_opus_reissue_kwargs
└── plugin.py — haiku_router_post_call_escalation handler + sub-register
```

Registered against the new hook via the orchestrator (KoraHermesPlugin.register now calls register_haiku_router).

Activation gates (all must hold for escalation): Kora-tagged route, iteration == 1 (post-call escalation only fires on iteration 1 — subsequent iterations already get Opus pre-call via tool_loop_iteration), original model is Haiku, env KORA_DISABLE_POST_CALL_ESCALATION not \"true\", response has text content, should_escalate_post_call(...) returns (True, <reason>).

Re-issue construction (parallel-Claude's pattern from R3): model swapped to Opus; messages extended with the Haiku response as an assistant turn + a terse reviewer prompt (\"Please review my last response and improve it if needed. Be terse if confirming.\"). Opus often returns a one-liner confirmation rather than redoing the work — ~30% cheaper escalations.

Hook contract diagram

```
agent/conversation_loop.py — iteration loop body
─────────────────────────────────────────────────
build api_kwargs
├─ pre_api_request_mutable hook ← #172 override-shape (model selection, caching)
└─ pre_api_request observer ← per-iteration observer

retry loop:
messages.create(**api_kwargs) → response
validate
record_inference (Haiku call telemetry; escalated_to_opus=False)
break-on-success

── retry-exhaustion guard ──

★ post_llm_call_can_reissue (NEW) ← KR-HERMES-LOCAL-EXT-REISSUE
for plugin_result in invoke_hook(...):
if 'reissue_with' in plugin_result:
api_kwargs ← plugin_result['reissue_with']
response ← interruptible*_api_call(api_kwargs)
record_inference (Opus reissue telemetry; escalated_to_opus=True)
break ← anti-loop: at most one re-issue per iteration

normalize_response → assistant_message
post_api_request observer ← sees FINAL response (post-reissue)

tool dispatch
```

Sample re-issue trace

```
[kora_hermes.cost_ladder] cost_ladder pre-call → default_haiku (iteration=1, route=slack_dm)
[kora_hermes] pre_api_request_mutable override: model=claude-haiku-4-5-20251001
API call #1 (1.42s) → Haiku response: "I'm not sure about the timing of the migration — I don't have enough context on the current load."
[kora.cost_ladder] record_inference: model=claude-haiku-4-5-20251001, escalated_to_opus=False
[kora_hermes.haiku_router] escalating to Opus post-call (reason=low_confidence_marker, route=slack_dm, haiku_chars=99, user_chars=142)
[kora_hermes] post_llm_call_can_reissue re-issuing API call (iteration=1, original_model=claude-haiku-4-5-20251001, new_model=claude-opus-4-7)
API call #1-reissue (2.18s) → Opus response: "Confirmed — wait until tonight's quiet window. The current load is too high for the lock window the migration needs."
[kora.cost_ladder] record_inference: model=claude-opus-4-7, escalated_to_opus=True
[kora_hermes] post_api_request observer fires (sees Opus response only)
```

Tests

30 new tests across two files, 125 directly-affected tests green:

  • tests/plugins/test_kora_hermes_plugin_haiku_router.py (23 tests): pure helpers, handler activation gating (non-Kora / iteration>1 / non-Haiku / env-disabled / no text / confident), escalation path (low-confidence marker + short-response heuristic), sub-register wiring, orchestrator integration, first-non-None override semantics via real PluginManager, plugin-exception fail-safe, discovery-shim alias, cost-ladder signature pin.
  • tests/agent/test_conversation_loop_post_llm_can_reissue.py (7 tests): structural pins on source (hook name + contract kwargs + anti-loop break + placement after retry-exhaustion guard + before post_api_request observer + escalated_to_opus=True telemetry), plus a behavioral test driving invoke_hook end-to-end.

Existing seven-hook count test in tests/plugins/test_kora_hermes_plugin.py updated to expect 8 (renamed test_register_function_wires_eight_hooks).

Acceptance / merge gate

  • Local Hermes hook post_llm_call_can_reissue added at conversation_loop.py site
  • Anti-loop safety (re-issue once per iteration) verified by structural + behavior tests
  • Backward-compat: Hermes-core tests pass; observer hooks fire on final response only
  • kora_hermes_plugin/haiku_router/ plugin file structure exists following feat(kora): KR-PLUGIN-COST-LADDER — first plugin extraction (cost-ladder) #185 template
  • Plugin registered to new hook; consumes should_escalate_post_call from cost_ladder/selector.py
  • Re-issue includes Haiku response as messages context (parallel-Claude pattern)
  • Telemetry: re-issued calls record escalated_to_opus=True. (escalation_reason is logged at INFO level today; promoting it to a record_inference field is left as a follow-up — see Notes below.)
  • All directly-affected tests pass; 56 broader-suite failures verified pre-existing on base (fastapi/blake3 env + systemd / HERMES_HOME tests unrelated to this change).
  • PR description includes: hook contract diagram + sample re-issue trace + escalation-rate target (~5-15% per [[feedback-opus-escalation-must-be-earned]])

Notes / follow-ups

  1. escalation_reason telemetry plumbing: today the reason (low_confidence_marker / short_response_for_long_input) is logged at INFO but not threaded through record_inference. Surfacing it as a structured field would need a small CostStateHolder.record_inference signature bump. Not required to land this bucket per spec STOP-ASK condition — happy to follow up if PM wants it for the panel.

  2. Re-issue treated as a single iteration: the second API call doesn't increment api_call_count or consume from the iteration budget — it's accounted as a transparent upgrade of iteration 1. If PM wants the re-issue to count separately, the change is one line in the hook block, but the current shape matches "Haiku-as-Opus-context" semantics better (one logical user-turn answer).

  3. Backward-compat shim: plugins/kora_hermes/__init__.py re-exports haiku_router_post_call_escalation as _post_llm_call_can_reissue for consumer-import stability. Mirrors the _pre_api_request_mutable precedent.

Test plan

  • Land + 48h operator burn-in on slack_dm route — watch escalation rate; should land in 5-15% per the earned-escalation feedback
  • Confirm cost-ladder cockpit panel still shows Haiku-vs-Opus split correctly (escalated_to_opus=True events land in the right bucket)
  • Verify no observer regression: existing post_llm_call + post_api_request observers see the FINAL response (post-reissue) only
  • Sanity-check KORA_DISABLE_POST_CALL_ESCALATION=true escape hatch — should fully no-op the new plugin

🤖 Generated with Claude Code

…mpletes Lock R3-2 Phase C

Batched pair. Adds the missing local Hermes post-LLM re-issue hook and wires the haiku-router sub-plugin against it so should_escalate_post_call (present at cost_ladder/selector.py since #185 but unwired) finally fires.

Deliverable A — post_llm_call_can_reissue (agent/conversation_loop.py): fires after messages.create returns and BEFORE normalize / post_api_request observer / tool dispatch. Plugins return {"reissue_with": <new api_kwargs>} to transparently re-call the API; first non-None wins; at most ONE re-issue per iteration (anti-loop break). Fail-safe: invoke_hook wraps each callback; outer guard protects the re-issue call itself; original response preserved on any failure. Telemetry: re-issued response feeds record_inference with escalated_to_opus=True (the original Haiku call already telemetered with the default False inside the retry-loop chokepoint).

Deliverable B — kora_cli/reasoning/kora_hermes_plugin/haiku_router/ (constants + escalator + plugin + sub-register, #185 template). Activation gates: Kora-tagged route, iteration == 1, original model is Haiku, KORA_DISABLE_POST_CALL_ESCALATION not "true", response has text content, should_escalate_post_call returns True. Re-issue kwargs: model swapped to Opus; messages extended with Haiku response as assistant turn + terse reviewer prompt (parallel-Claude's pattern — ~30% cheaper escalations).

Tests: 30 new tests, 125 directly-affected tests green. Existing seven-hook count test updated to expect 8. Pre-existing failures elsewhere in the suite are environmental (fastapi/blake3/HERMES_HOME) and verified to fail identically on the base branch.

After this: 6 of 7 plugin extractions complete; KR-PLUGIN-IDENTITY remains deferred per Lock R3-2. All escalation paths from KR-HAIKU-ROUTER (#165) are now functional end-to-end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rafe-walker rafe-walker merged commit 785b708 into feature/phase2-upgrades May 24, 2026
@rafe-walker rafe-walker deleted the feat/kora-KR-HERMES-LOCAL-EXT-REISSUE-AND-HAIKU-ROUTER-PLUGIN-PAIR branch May 24, 2026 06:41
rafe-walker added a commit that referenced this pull request May 24, 2026
…-DAEMON-PREP-MEGABUCKET

feat(kora): KR-CC3-CLEANUP-AND-DAEMON-PREP-MEGABUCKET — #189 follow-ups + daemon audit + upstream prep
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant